Analyse de données métagénomiques 16S

Module 20

Olivier Rué

Migale

Christelle Hennequet-Antier

MaIAGE

Cédric Midoux

PROSE & MaIAGE

June 8, 2026

Introduction

Practical informations

9h00 - 17h00
2 breaks morning and afternoon
Lunch at INRAE restaurant (not mandatory)
Questions are strongly encouraged
Everyone has something to learn from each other

Better know you

Who are you?

Institution / Laboratory / position

What is your scientific topic?

Studied ecosystem
Scientific question
Experimental design

What is your background?

Already treated shotgun data?
Background in bioinformatics?
Background in biostatistics?

Better know us

Open infrastructure dedicated to life sciences
- Computing resources, tools, databanks…
Dissemination of expertise in bioinformatics, biostatistics
Design and development of applications
Data analysis

Data analysis service

https://documents.migale.inrae.fr/data-analysis.html

We are specialized in genomics/metagenomics
5 Bioinformaticians and 2 Statisticians
More than 160 projects since 2016
LRQA certified process
2 types of services
- Classical collaboration (we perfom the analyses)
- Accompaniment (we help you do the analysis yourself)

Aim of this training

After this 4 days training, you will:

Know the outlines, advantages and limits of amplicon sequencing data analysis
Be able to use FROGS (through Galaxy) and phyloseq (through easy16S) tools on the training data set
Be able to identify tools and parameters adapted to your own analyses

Aim of this training

Liu et al., 2020: A practical guide to amplicon and metagenomic analysis of microbiome data [1]

Program

DAY 1

Introduction
Introduction to amplicon analysis
Introduction to Galaxy
Quality control
FROGS (1)

DAY 2

FROGS (2)
FROGSfunc

DAY 3

Introduction
Easy16S
Composition
\(\alpha\) and \(\beta\) diversities
Ordination

DAY 4

PERMANOVA and hypothesis tests
Differential abundance
Train on your own dataset or on another provided dataset

Program

Training with Easy16S

DAY 3

Introduction
Easy16S
Composition
\(\alpha\) and \(\beta\) diversities
Ordination

DAY 4

PERMANOVA and hypothesis tests
Differential abundance
Train on your own dataset or on another provided dataset

Microbiome tools

Aims (1/2)

Become familiar with {phyloseq} [2] R package and {Easy16S} [3] Shiny Web Application for the analysis of microbiome datasets.

Exploratory Data Analysis

\(\alpha\)-diversity: how diverse is my community?
\(\beta\)-diversity: how different are two communities?
Use a distance matrix to study structures:
- Hierarchical clustering: how do the communities cluster?
- Permutational ANOVA: Communities structured by some environmental factor?

Aims (2/2)

Become familiar with {phyloseq} [2] R package and {Easy16S} [3] Shiny Web Application for the analysis of microbiome datasets.

Visual assessment of the data

bar plots: what is the composition of each community?
Multidimensional Scaling: how are communities related?
Heatmaps: are there interactions between species and (groups of) communities?
Differential Abundances: which taxa are differentially abundant?

phyloseq and companion tools

phyloseq R package [2] from Bioconductor

importing data from a variety of common formats
preprocessing data (storing, filtering, subsetting, transforming…)
performing analysis and graphics to explore microbiome profils

Companion tools

Customs functions designed to enhance core functionality {phyloseq-extended} [4]
Community ecology functions: {vegan}, {ade4}, {picante}
Tree manipulation: {ape}
Orchestrating Microbiome Analysis with Bioconductor {mia}

easy16S (1/2)

Shiny Web Application [3]

Strengths

specifically designed for biologists, data analysts and traineers
facilitates exploratory microbiome data analysis, data visualization, and statistical analysis
based on the {phyloseq} data structure [2]
data visualization with {ggplot2}

easy16S (2/2)

Shiny Web Application [3]

Content

Key tables constituting the phyloseq object;
Metadata visualization using esquisse (Meyer & Perrier, 2018);
Taxonomic composition barplot;
Rarefaction curves;
Abundance heatmap;

Richness within a sample (α-diversity): table, scatterplot and ANOVA;
Dissimilarity between samples (β-diversity): table, sample heatmap, sample clustering,
MultiDimensional Scaling and Multivariate ANOVA;
Principal Component Analysis;
Differential abundance analysis.

Phyloseq data structure (1/4)

Phyloseq data structure (2/4)

A phyloseq object is made of up to 5 components (or slots):

otu_table: an OTU/ASV abundance table
sample_data: a table of sample metadata reflecting experimental desing, like sequencing technology, location of sampling…
tax_table: a table of taxonomic descriptors for each OTU/ASV typically the taxonomic assignation at different rank (Domain, Kingdom, Phylum, Class, Order, Family, Genus, Species)
phy_tree: a phylogenetic tree of the otus
refseq: a set of reference sequences (one per OTU/ASV)

Phyloseq data structure (3/4)

The biom format natively supports

OTU/ASV count table (the otu_table)
taxonomic description (the tax_table)
sample description (the sample_data)

The other components are optional and must be stored in separate files

phylogenetic tree in Newick format (the phy_tree)
sequences in fasta format (the refseq)

Phyloseq data structure (4/4)

The import functions create consistent objects with

the same OTU/ASV in the count table, the taxonomy table and the phylogenetic tree;
the same samples in the count table and the metadata table

Check the import phyloseq data

Samples/Taxa are matched by column names and/or rownames. Make sure that the table have them!!!
Any OTU/ASV absent from some components will be trimmed
Any sample absent from some components will be trimmed
Check number of taxa/samples and be wary of names mismatches

Microbiome dataset

Dataset description

Projet MetaPDOcheese [5]

Practice

Launch Easy16S
Import datasets from “demo”

Practice

Project MetaPDOcheese [5]
- more than 200 samples
- 44 AOP laitières françaises, divided in 7 technological categories
- Raw data available in Recherche Data Gouv : 10.57745/UCJG6S.
Input dataset containing a subset of 72 samples analysed avec FROGS [6]

Practice

First explorations
Exploring samples composition

Microbiome transformations

Preprocess data

It’s useful to explore the data and prepare datasets to be analysed depending on the biological questions.

Samples filter

Select samples based on their name
Filter samples based on the sample_data table (metadata, experimental design)
Prune samples whose sum of total counts (reads) does not satisfy a given threshold

Taxa transformation

Agglomerate taxa at higher taxonomic rank allows

to reduce the complexity of the dataset, especially in cases of low resolution
to focus on major composition.

Spread taxonomy to remove unknown and multi-affiliations by spreading the last known rank to further ranks

Rarefaction (1/2)

Nature of microbiome data

Compositionality, absolute abundance not real abundance of one taxon
Zero-inflation
Bias due to sequencing depth or sample size (= total number of reads per sample)

Raw abundances of taxa per sample are not comparable between samples

Rarefaction (2/2)

Rarefying = subsampling sequence reads without replacement based on the smallest total number of reads

All samples have the same depth, set as the smallest number of reads. Rarefied abundances of taxa per sample are recommanded for diversity analyses.

Rarefaction curve:

Extracts the number of unique ASVs (richness) depending on sample size.
A very useful diagnostic graph

Practice

Preprocess data

Samples filter
Agglomerate taxa
Rarefy data
Explore composition on different subsets
Compare composition with and without rarefaction

Microbiome biodiversity

Some slides were adapted from Mahendra Mariadassou’s supports

Different type of diversity indices

\(\alpha\)-diversity: diversity within a sample/community
- which community is more diverse ?
\(\beta\)-diversity: diversity between samples/communities
- Which communities are most similar?
\(\gamma\)-diversity: total species diversity across a landscape, combining local \(\alpha\)-diversity and \(\beta\)-diversity differences among sites

\(\alpha\)-diversity

How many species in each sample/community ? Distribution of species ?

quantitative measure of the biodiversity within a sample/community
at the ASV level or any other taxonomic rank

4 categories, the most popular indices

Richness: Observed, Chao1
Information (Abundance distribution): Shannon
Dominance (or Evenness): Simpson, invSimpson
Phylogenetics: Faith

\(\alpha\)-diversity - Richness based

Number of observed species
- \(\text{Observed} = \sum_{s} 1_{\{p_s > 0\}} = \sum_{i} c_i = \text{S}\)
Observed + (estimated) number of unobserved species
- \(\text{Chao1} = \text{Observed} + \hat{c}_0\)

Note \(c_i\) the number of species observed \(i\) times (\(i = 1, 2, \dots\)) and \(p_s\) the proportion of species \(s\) (\(s = 1, \dots, S\)).

\(\alpha\)-diversity - Shannon

Shannon
- \(\text{H} = - \sum_{s} p_s \log\left( p_s \right)\)

Note \(p_s = n_{s}/N\) the proportion of species \(s\) (\(s = 1, \dots, S\)), \(n_{s}\) number of species \(s\) et \(N\) total number of species.

Take into account the relative abundance of each taxon \(p_s\)

H = 0 when the sample contains only one specie.
H increases when the number of different species increases. In other words, the diversity is high.
H = log(S) is maximal when each species is equally representated.

\(\alpha\)-diversity - Eveness

Simpson diversity (D)
- \(\text{D} = p_1^2 + \dots + p_S^2\)
inverse Simpson, inverse probability that two sequences sampled at random come from the same species
- \(\text{InvSimpson} = \frac{1}{p_1^2 + \dots + p_S^2} \leq S\)

Take into account the relative abundance of each taxon \(p_s\)

Capture how dominated a community is by its most abundant taxa, asking “What is the probability that two randomly chosen sequences belong to the same taxon?”

Interpretation: - If one specie dominates completely : D = 1 and 1/D = 1 (minimal value) - 1/D increases with diversity (as would Shannon, richness and others)

\(\alpha\)-diversity - illustration

Even	Uneven
Observed	15	15
Shannon	2.71	2.06
invSimpson	15	5.45

Low Shannon and invSimpson, communities are dominated by a few abundant taxa.

\(\alpha\)-diversity - filtering

Many \(\alpha\) diversities (Observed, Chao) depend a lot on rare ASV. Do not trim rare ASV before computing them as it can drastically alter the result.

Practice

Consider working with rarefied data

Produce \(\alpha\) diversity table
Explore \(\alpha\) diversities

\(\alpha\) diversity - Test (1/2)

Perform an analysis of variance (ANOVA) to test the impact of some covariates in the experimental design (in the sample_data table).

Test the null hypothesis \(H_O\), there is no difference between the biological conditions (groups) versus the mean is different between groups.

Assumptions

\(\epsilon \sim_{iid} \mathcal{N}(0,\,\sigma^{2})\)
Gauss law, independence, heteroscedasticity

\(\alpha\) diversity - Test (2/2)

Reject \(H_0\) when \(Pr(F > f) \leq \alpha\) (significant level, 5% usually)
To understand group differences in ANOVA, conduct post hoc tests also called “multiple comparison analysis” tests.
- Easy16S: Tukey’s Honest Significant Differences

Practice

Test and interpret

The seasonal effect on \(\alpha\) diversity
The AOP effect on \(\alpha\) diversity
The seasonal effect on \(\alpha\) diversity inside a fixed AOP (subset)

\(\alpha\) diversity - rarefaction

best approach to control for uneven sequencing effort

Schloss PD. 2024. Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses. mSphere 9:e00354-23. https://doi.org/10.1128/msphere.00354-23

\(\beta\) diversity

\(\beta\)-diversity: diversity between samples/communities
- Which communities are most similar/dissimilar accross samples?

Jaccard index

measures the fraction of species specific to either A or B

\({\text{Jaccard}} = \frac{\sum_{s} 1_{\{n^A_s > 0, n^B_s = 0 \}} + 1_{\{n^B_s > 0, n^A_s = 0 \}}}{\sum_{s} 1_{\{n^A_s + n^B_s > 0 \}}}\)

Note \(n^A_s\) the count of species \(s\) (\(s = 1, \dots, S\)) in community \(A\) and \(n^B_s\) the count in community \(B\). We focus on shared taxa.

A simple example

Bacteria	Sample A	Sample B
🌹	50	10
🌻	10	50
🌸	20	5
🌼	5	20
Total	85	85

Result:

\[ \frac{0}{4 + 4} = 0 \]

Jaccard index

measures the fraction of species specific to either A or B

\({\text{Jaccard}} = \frac{\sum_{s} 1_{\{n^A_s > 0, n^B_s = 0 \}} + 1_{\{n^B_s > 0, n^A_s = 0 \}}}{\sum_{s} 1_{\{n^A_s + n^B_s > 0 \}}}\)

Note \(n^A_s\) the count of species \(s\) (\(s = 1, \dots, S\)) in community \(A\) and \(n^B_s\) the count in community \(B\).

Focus on shared taxa.

Jaccard = 0 when all taxa are shared
Jaccard = 1 when all taxa are specific

Bray Curtis distance

The Bray Curtis distance mixes which species are present in each sample and how abundant they are.

\(\displaystyle {\text{BC}} = \sum_{s} |n^A_s - n^B_s| / \sum_{s} |n^A_s + n^B_s|\)

Note \(n^A_s\) the count of species \(s\) (\(s = 1, \dots, S\)) in community \(A\) and \(n^B_s\) the count in community \(B\).

A simple example

Bacteria	Sample A	Sample B
🌹	50	10
🌻	10	50
🌸	20	5
🌼	5	20
Total	85	85

Computation Details

Bacteria	Diff. \|A - B\|
🌹	\|50 - 10\| = 40
🌻	\|10 - 50\| = 40
🌸	\|20 - 5\| = 15
🌼	\|5 - 20\| = 15
Total	110

Result:

\[ \frac{110}{85 + 85} = 0.647 \]

Bray Curtis distance

The Bray Curtis distance mixes which species are present in each sample and how abundant they are.

\(\displaystyle {\text{BC}} = \sum_{s} |n^A_s - n^B_s| / \sum_{s} |n^A_s + n^B_s|\)

Note \(n^A_s\) the count of species \(s\) (\(s = 1, \dots, S\)) in community \(A\) and \(n^B_s\) the count in community \(B\).

Focus on shared taxa

Bray Curtis distance = 0 when all abundances are shared
Bray Curtis distance = 1 when all abundances are specific

Unifrac index

measures the proportion of the length of the phylogenetic tree specific to either a sample or the other.

\({\text{UF}} = \frac{\sum_{e} l_e \left[ 1_{\{p_e > 0, q_e = 0 \}} + 1_{\{q_e > 0, p_e = 0 \}} \right] }{\sum_{e} l_e \times 1_{\{p_e + q_e > 0 \}}}\)

For each branch \(e\), note \(l_e\) its length and \(p_e\) (resp. \(q_e\)) the fraction of community \(A\) (resp. community \(B\)) below branch \(e\).

Unifrac index

1. Tree representing phylogenetically similar communities, where a significant fraction of the branch length in the tree is shared (gray).
1. Tree representing two communities that are maximally different so that 100% of the branch length is unique to either the circle or square environment.

Lozupone C, Knight R2005.UniFrac: a New Phylogenetic Method for Comparing Microbial Communities. Appl Environ Microbiol71:.https://doi.org/10.1128/AEM.71.12.8228-8235.2005

Unifrac index

measures the proportion of the length of the phylogenetic tree specific to either a sample or the other.

\(d_{\text{UF}} = \frac{\sum_{e} l_e \left[ 1_{\{p_e > 0, q_e = 0 \}} + 1_{\{q_e > 0, p_e = 0 \}} \right] }{\sum_{e} l_e \times 1_{\{p_e + q_e > 0 \}}}\)

For each branch \(e\), note \(l_e\) its length and \(p_e\) (resp. \(q_e\)) the fraction of community \(A\) (resp. community \(B\)) below branch \(e\).

Focus on shared taxa

Unifrac = 0 when all tree branches are shared (abundances can vary)
Unifrac = 1 when all tree branches are specific

Weighted-Unifrac index

measures the proportion of the length of the phylogenetic tree specific to a sample, weighted by the abundance differences.

\({\text{wUF}} = \frac{\sum_{e} l_e | p_e - q_e| }{\sum_{e} l_e (p_e + q_e)}\)

For each branch \(e\), note \(l_e\) its length and \(p_e\) (resp. \(q_e\)) the fraction of community \(A\) (resp. community \(B\)) below branch \(e\).

Focus on shared taxa

Weighted-Unifrac = 0 when all ASV are shared with the same abundances
Weighted-Unifrac = 1 when all ASV are specific and have no tree branch in common

Caracteristics of \(\beta\) diversity (1/2)

	Qualitative	Quantitative
No phylogeny	Jaccard	Bray-Curtis
With phylogeny	Unifrac	Weigthed-Unifrac

Jaccard lower than Bray-Curtis ⇒ abondant taxa are not shared
Jaccard higher than Unifrac ⇒ communities’ taxa are distinct but phylogenetically related
Unifrac higher than weighted Unifrac ⇒ abondant taxa in both communities are phylogenetically close.

Caracteristics of \(\beta\) diversity (2/2)

In general, qualitative diversities are most sensitive to factors that affect presence/absence of organisms (such as pH, salinity, depth, etc) and therefore useful to study and define bioregions (regions with little of no flow between them)…

… whereas quantitative distances focus on factors that affect relative changes (seasonal changes, nutrient availability, concentration of oxygen, depth, etc) and therefore useful to monitor communities over time or along an environmental gradient.

\(\beta\) diversity - rarefaction

best approach to control for uneven sequencing effort

Schloss PD. 2024. Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses. mSphere 9:e00354-23. https://doi.org/10.1128/msphere.00354-23

Practice

Consider working with rarefied data

Produce \(\beta\) diversity table
Draw \(\beta\) diversity heatmaps

Visualisation MDS or PCoA plots

Multi-Dimensional Scaling (MDS), also named Principal Coordinate Analysis (PCoA)

Aim: find a lower dimensional representation of the data such that the most information is retained.

Construct a matrix of pairwise dissimilarities D between each pair of samples (Jaccard, Bray-Curtis dissimilarity, UniFrac distance…).
Find the first principal coordinates (PCs) that best preserve the original distribution of distances.

MDS or PCoA plots

PCoA focuses on inter-sample dissimilarities

Each point corresponds to a separate sample
The coordinates of each point (sample) are determined by the value of the PCs (projection)
% of variance explained on each axis.

Advantages MDS or PCoA plots

Based on Non-Euclidean Metrics: Bray-Curtis, Jaccard, UniFrac, Aitchison distance…
Each PCoA axis is accompanied by the proportion of explained variance, which indicates the extent to which a reduced-dimension projection reflects the original data.
Compare communities samples/across environmental factors to gain deeper insights into biodiversity patterns.
The influence of these factors will be assessed using a Permutational Multivariate ANOVA (PERMANOVA).

Practice

Create PCoA plots by varying the dissimilarity matrix.
What environmental factors structure the dataset?
What percentage of the variance is explained?

\(\beta\) diversity partitioning

Test the differences in the community composition of communities (beta diversity) from different groups using a distance matrix,

with the assumption of homogeneous dispersions.

PERMANOVA (1/2)

Permutational Multivariate ANOVA (PERMANOVA) compares the variation between groups to the variation within groups.

Three partitions of the variation

Total variation (total sum of squares, SST), SST = SSB + SSW
Variation within groups (sum of squares within groups, SSW)
Variation between groups (sum of squares between groups, SSB)

PERMANOVA (2/2)

The significance of the pseudo- F statistic (on the between/within- groups ratio) is evaluated by simulating the null distribution from permutations, with to the design.
p-values are computed using these permutations.

Limitations

determining the most suitable distance matrix remains challenging
it requires permutation to establish its significance, which can be computationally expensive

Practice

Based on the experimental design (sample_data, metadata), formulate biological questions and write the corresponding statistical model.
Which dissimilarity matrices are most suitable for community composition of communities?
Which factors impact \(\beta\) diversity?

Differential abundance analysis

Some slides were adapted from RNASeq formation, module 16 and 23’s Migale formation supports

Differential abundance analysis (DAA)

Aim: Identify ASV (or any taxonomic rank) with differential abundance between biological conditions ?

Challenges of microbiome data to be addressed in DAA:

Different sequencing depth
Compositionality
Sparsity and zero inflation

There are many methods, and this remains an active area of research. There is currently no consensus.

Principles of DESeq2 (1/2)

DESeq2 is based on negative binomial generalized linear model and implemented in Easy16S to perform DAA.

The model is defined as follows:

\(K_{ij} \sim\) NB(mean = \(\mu_{ij}\), dispersion = \(\phi_i\))

\(\log \mu_{ij} = x_j^T \beta_i + \log s_{ij}\)

where \(K_{ij}\) is the count for ASV i in sample j, \(\phi_i\) is is the ASV-specific dispersion, \(s_{ij}\) effective library size (e.g. sequencing depth), \(\mathbf{X}=\left[x_{j}\right]\) is the design matrix and \(\beta_i\) is the vector of coefficients.

A Generalized Linear Model (GLM) allows to decompose the effects on the mean of different factors and their interactions.

Principles of DESeq2 (2/2)

Differential abundance analysis is performed taxon-by-taxon with replicates, using directly raw counts.

Normalization is included in the model. Adapted to compare samples with similar community compositions.

Assumptions:

the majority of taxa is invariant between conditions, comparing samples with similar community compositions.

Practice

Based on the experimental design (sample_data, metadata), formulate biological questions and write the corresponding statistical model.
Which ASV are differentially abondant?

Multiple testing

Some slides were adapted from RNASeq formation, module 16 and 23’s Migale formation supports

Concept

False positive (FP) (type I error: \(\alpha\)) : A non differentially adundant (DA) taxon which is declared DA.

For all ‘taxa (ASV)’, we test \(H_0\) = {taxon \(i\) is not DA} vs \(H_1\) = {the taxon is DA} using a statistical test (calcul of a score)

Let assume all the \(N\) taxa are not DA. Each test is realized at \(\alpha\) level

Ex: \(N=10 000\) taxa and \(\alpha= 0.05\) \(\rightarrow\) \(\mathbb{E}(FP)=500\) taxa.

The Family Wise Error Rate (FWER)

Probability of having at least one Type I error (false positive), of declaring DA at least one non DA taxon.

\[FWER = \mathbb{P}(FP \ge 1)\]

The Bonferroni procedure

Either each test is realized at \(\alpha=\alpha^*/N\) level
or use of adjusted pvalue \(pBonf_g=min(1,p_g*N)\) and FWER \(\leq\alpha^*\).

For \(N=2 000\), FWER \(\leq\alpha^*=0.05\), \(\alpha= 2.5 10^{-5}\).

Easy but conservative and not powerful.

When the number of tests increases, the FWER \(\rightarrow\) 1 with constant FP.

The False Discovery Rate (FDR)

Idea: Do not control the error rate but the proportion of error

\(\Rightarrow\) less conservative than control of the FWER.

The false discovery rate of FDR’s Benjamini-Hochberg is the expected proportion of Type I errors among the rejected hypotheses

\(FDR = \mathbb{E}(FP / P)\) if \(P > 0\) and \(0\) if \(P=0\)

Principle: The number of declared positive elements \(P\) is given by the greater rank \(i\)

\[p_{(i)} \leq i \alpha^* / N\] with N the total number of taxa (ASV).

FDR \(\leq\) FWER

Experimental design

Some slides were adapted from RNASeq formation, module 16 and 23’s Migale formation supports

Experimental Design

A good design is a list of experiments to conduct in order to answer to the asked question which maximize collected information and minimize experiments cost with respect to constraints.

Well define the biological question, get together and collect a priori knowledge (e.g. reference genome, splicing…),
Anticipate, Identify all factors of variation and adapt Fisher’s principles (1935), collect metadata from experiment and sequencing,
Include independent biological replicates to ensure reproducibility and accuracy of results

Summary

A microbiome analysis involves…

References

1. Liu Y-X, Qin Y, Chen T, Lu M, Qian X, Guo X, et al. A practical guide to amplicon and metagenomic analysis of microbiome data. Protein & Cell. 2020;12:315–30. doi:10.1007/s13238-020-00724-8.

2. McMurdie PJ, Holmes S. Phyloseq: An r package for reproducible interactive analysis and graphics of microbiome census data. PloS one. 2013;8:e61217.

3. Midoux C, Rué O, Chapleur O, Bize A, Loux V, Mariadassou M. Easy16S: A user-friendly shiny web-service for exploration and visualization of microbiome data. Journal of Open Source Software. 2024;9:6704. doi:10.21105/joss.06704.

4. Mariadassou M. Phyloseq-extended: Various customs functions written to enhance the base functions of phyloseq. 2018. https://github.com/mahendra-mariadassou/phyloseq-extended.

5. Irlinger F, Mariadassou M, Dugat-Bony E, Rué O, Neuvéglise C, Renault P, et al. A comprehensive, large-scale analysis of “terroir” cheese and milk microbiota reveals profiles strongly shaped by both geographical and human factors. ISME Communications. 2024;4. doi:10.1093/ismeco/ycae095.

6. Bernard M, Rué O, Mariadassou M, Pascal G. FROGS: a powerful tool to analyse the diversity of fungi with special management of internal transcribed spacers. Briefings in Bioinformatics. 2021;22. doi:10.1093/bib/bbab318.