Module 20
June 8, 2026
remove_chimera.py
--input-biom clustering.biom \
--input-fasta clustering.fasta \
--non-chimera remove_chimera.fasta \
--out-abundance remove_chimera.biom \
--summary remove_chimera.html
@ST-E00114:1342:HHMGVCCX2:1:1101:3123:2012 1:N:0:TCCGGAGA+TCAGAGCC
CTTGGTCATTTAGAG
+
***<<*AEF???***
@ST-E00114:1342:HHMGVCCX2:1:1101:11556:2030 1:N:0:TCCGGAGA+TCAGAGCC
CATTGGCCATATCAT
+
AAAE??<<*???***
Meaning
@Identifier1 (comment)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
@Identifier2 (comment)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
Measure of the quality of the identification of the nucleobases generated by automated DNA sequencing
file.fastq.gzTry to answer to (not always) simple questions:
Warning
QC without context leads to misinterpretation!

ASV are inferred by a de novo process in which biological sequences are discriminated from errors on the basis of the expectation that biological sequences are more likely to be repeatedly observed than are error-containing sequences
Swarm [8] is a notably different sequence clustering approach, which, while technically a clustering algorithm, may also be considered a denoising method when using the fastidious method with d=1. It relies on the maximum number of differences between reads (local linking threshold) and forms clusters that are resilient to input-order changes, thus creating stable, high-resolution features (herein referred to as swarm-clusters). When using the fastidious method with d=1, swarm aims to produce clusters centered around real biological sequences, where clusters represent sequence variants.
Since FROGS uses swarm (with the fastidious method with d=1) and strongly promotes denoising by chimera removal and cluster filtering, FROGS produces ASVs.
denoising algorithm is included.
Reference based: against a database of «genuine» sequences
De novo: against abundant sequences in the samples
FROGS uses vsearch [4] as chimera removal tool
FROGS: Uses an alignment-based consensus approach (primarily BLAST).
RDP Classifier: Implements a Native Bayesian classifier that cuts sequences into 8-mer words to calculate the probability of a sequence belonging to a specific taxonomic node, providing a bootstrap confidence score for each rank.
Sintax: Operates as a fast, non-Bayesian classifier that uses k-mer similarity to find the top matching sequences in a reference database and calculates taxomic confidence via bootstrap sampling of those k-mers.
IDTaxa (DECIPHER): Employs a machine learning approach based on a novel classification algorithm that reduces over-classification errors by inherently learning when to stop assigning taxonomy if the evidence is insufficient.
DADA2: Uses a Naive Bayesian classifier implementation (adapted from the RDP Classifier algorithm) that breaks sequences into 8-mers to assign taxonomy against reference databases, while uniquely allowing for an optional, exact-string-matching step to resolve assignments down to the strict 100% species level.
Bacteria;Firmicutes;Bacilli;Staphylococcales;Staphylococcaceae;Staphylococcus;Staphylococcus xylosus
Bacteria;Firmicutes;Bacilli;Staphylococcales;Staphylococcaceae;Staphylococcus;Staphylococcus saprophyticus
Strictly identical (V1-V3 amplification) on 499 nucleotides
Remaining contamination?
Want to analyse only the Firmicutes?
Want to remove ASVs without affiliation?
Want to hide affiliation if metrics are too bad
Want to ignore taxonomies with unknown species
2 modes

Module 20 - Metabarcoding