Analyse de données métagénomiques 16S

Module 20

Christelle Hennequet-Antier

MaIAGE

Cédric Midoux

PROSE & MaIAGE

June 8, 2026

Introduction

Practical informations

  • 9h00 - 17h00
  • 2 breaks morning and afternoon
  • Lunch at INRAE restaurant (not mandatory)
  • Questions are strongly encouraged
  • Everyone has something to learn from each other

Better know you

Who are you?

  • Institution / Laboratory / position

What is your scientific topic?

  • Studied ecosystem
  • Scientific question
  • Experimental design

What is your background?

  • Already treated shotgun data?
  • Background in bioinformatics?
  • Background in biostatistics?

Better know us

  • Open infrastructure dedicated to life sciences
    • Computing resources, tools, databanks…
  • Dissemination of expertise in bioinformatics, biostatistics
  • Design and development of applications
  • Data analysis

Data analysis service

https://documents.migale.inrae.fr/data-analysis.html

  • We are specialized in genomics/metagenomics
  • 5 Bioinformaticians and 2 Statisticians
  • More than 160 projects since 2016
  • LRQA certified process
  • 2 types of services
    • Classical collaboration (we perfom the analyses)
    • Accompaniment (we help you do the analysis yourself)

Aim of this training

After this 4 days training, you will:

  • Know the outlines, advantages and limits of amplicon sequencing data analysis
  • Be able to use FROGS (through Galaxy) and phyloseq (through easy16S) tools on the training data set
  • Be able to identify tools and parameters adapted to your own analyses

Aim of this training

Liu et al., 2020: A practical guide to amplicon and metagenomic analysis of microbiome data [1]

Program

DAY 1

  • Introduction
  • Introduction to amplicon analysis
  • Introduction to Galaxy
  • Quality control
  • FROGS (1)

DAY 2

  • FROGS (2)
  • FROGSfunc

DAY 3

  • Introduction
  • Easy16S
  • Composition
  • \(\alpha\) and \(\beta\) diversities
  • Ordination

DAY 4

  • PERMANOVA and hypothesis tests
  • Differential abundance
  • Train on your own dataset or on another provided dataset

Program

Training with Easy16S

DAY 3

  • Introduction
  • Easy16S
  • Composition
  • \(\alpha\) and \(\beta\) diversities
  • Ordination

DAY 4

  • PERMANOVA and hypothesis tests
  • Differential abundance
  • Train on your own dataset or on another provided dataset

Microbiome tools

Aims (1/2)

Become familiar with {phyloseq} [2] R package and {Easy16S} [3] Shiny Web Application for the analysis of microbiome datasets.

Exploratory Data Analysis

  • \(\alpha\)-diversity: how diverse is my community?

  • \(\beta\)-diversity: how different are two communities?

  • Use a distance matrix to study structures:

    • Hierarchical clustering: how do the communities cluster?
    • Permutational ANOVA: Communities structured by some environmental factor?

Aims (2/2)

Become familiar with {phyloseq} [2] R package and {Easy16S} [3] Shiny Web Application for the analysis of microbiome datasets.

Visual assessment of the data

  • bar plots: what is the composition of each community?
  • Multidimensional Scaling: how are communities related?
  • Heatmaps: are there interactions between species and (groups of) communities?
  • Differential Abundances: which taxa are differentially abundant?

phyloseq and companion tools

phyloseq R package [2] from Bioconductor

  • importing data from a variety of common formats
  • preprocessing data (storing, filtering, subsetting, transforming…)
  • performing analysis and graphics to explore microbiome profils

Companion tools

  • Customs functions designed to enhance core functionality {phyloseq-extended} [4]
  • Community ecology functions: {vegan}, {ade4}, {picante}
  • Tree manipulation: {ape}
  • Orchestrating Microbiome Analysis with Bioconductor {mia}

easy16S (1/2)

Shiny Web Application [3]

Strengths

  • specifically designed for biologists, data analysts and traineers
  • facilitates exploratory microbiome data analysis, data visualization, and statistical analysis
  • based on the {phyloseq} data structure [2]
  • data visualization with {ggplot2}

easy16S (2/2)

Shiny Web Application [3]

Content

  • Key tables constituting the phyloseq object;
  • Metadata visualization using esquisse (Meyer & Perrier, 2018);
  • Taxonomic composition barplot;
  • Rarefaction curves;
  • Abundance heatmap;
  • Richness within a sample (α-diversity): table, scatterplot and ANOVA;
  • Dissimilarity between samples (β-diversity): table, sample heatmap, sample clustering,
  • MultiDimensional Scaling and Multivariate ANOVA;
  • Principal Component Analysis;
  • Differential abundance analysis.

Phyloseq data structure (1/4)

Phyloseq data structure (2/4)

A phyloseq object is made of up to 5 components (or slots):

  • otu_table: an OTU/ASV abundance table
  • sample_data: a table of sample metadata reflecting experimental desing, like sequencing technology, location of sampling…
  • tax_table: a table of taxonomic descriptors for each OTU/ASV typically the taxonomic assignation at different rank (Domain, Kingdom, Phylum, Class, Order, Family, Genus, Species)
  • phy_tree: a phylogenetic tree of the otus
  • refseq: a set of reference sequences (one per OTU/ASV)

Phyloseq data structure (3/4)

The biom format natively supports

  • OTU/ASV count table (the otu_table)
  • taxonomic description (the tax_table)
  • sample description (the sample_data)

The other components are optional and must be stored in separate files

  • phylogenetic tree in Newick format (the phy_tree)
  • sequences in fasta format (the refseq)

Phyloseq data structure (4/4)

The import functions create consistent objects with

  • the same OTU/ASV in the count table, the taxonomy table and the phylogenetic tree;
  • the same samples in the count table and the metadata table

Check the import phyloseq data

  • Samples/Taxa are matched by column names and/or rownames. Make sure that the table have them!!!
  • Any OTU/ASV absent from some components will be trimmed
  • Any sample absent from some components will be trimmed
  • Check number of taxa/samples and be wary of names mismatches

Microbiome dataset

Dataset description

Projet MetaPDOcheese [5]

Practice

  • Launch Easy16S
  • Import datasets from “demo”

Practice

  • Project MetaPDOcheese [5]
  • Input dataset containing a subset of 72 samples analysed avec FROGS [6]

Practice

  • First explorations
  • Exploring samples composition

Microbiome transformations

Preprocess data

It’s useful to explore the data and prepare datasets to be analysed depending on the biological questions.

Samples filter

  • Select samples based on their name
  • Filter samples based on the sample_data table (metadata, experimental design)
  • Prune samples whose sum of total counts (reads) does not satisfy a given threshold

Taxa transformation

Agglomerate taxa at higher taxonomic rank allows

  • to reduce the complexity of the dataset, especially in cases of low resolution
  • to focus on major composition.

Spread taxonomy to remove unknown and multi-affiliations by spreading the last known rank to further ranks

Rarefaction (1/2)

Nature of microbiome data

  • Compositionality, absolute abundance not real abundance of one taxon
  • Zero-inflation
  • Bias due to sequencing depth or sample size (= total number of reads per sample)

Raw abundances of taxa per sample are not comparable between samples

Rarefaction (2/2)

Rarefying = subsampling sequence reads without replacement based on the smallest total number of reads

All samples have the same depth, set as the smallest number of reads. Rarefied abundances of taxa per sample are recommanded for diversity analyses.

Rarefaction curve:

  • Extracts the number of unique ASVs (richness) depending on sample size.
  • A very useful diagnostic graph

Practice

Preprocess data

  • Samples filter
  • Agglomerate taxa
  • Rarefy data
  • Explore composition on different subsets
  • Compare composition with and without rarefaction

Microbiome biodiversity

Some slides were adapted from Mahendra Mariadassou’s supports

Different type of diversity indices

  • \(\alpha\)-diversity: diversity within a sample/community
    • which community is more diverse ?
  • \(\beta\)-diversity: diversity between samples/communities
    • Which communities are most similar?
  • \(\gamma\)-diversity: total species diversity across a landscape, combining local \(\alpha\)-diversity and \(\beta\)-diversity differences among sites

\(\alpha\)-diversity

How many species in each sample/community ? Distribution of species ?

  • quantitative measure of the biodiversity within a sample/community
  • at the ASV level or any other taxonomic rank

4 categories, the most popular indices

  • Richness: Observed, Chao1
  • Information (Abundance distribution): Shannon
  • Dominance (or Evenness): Simpson, invSimpson
  • Phylogenetics: Faith

\(\alpha\)-diversity - Richness based

  • Number of observed species
    • \(\text{Observed} = \sum_{s} 1_{\{p_s > 0\}} = \sum_{i} c_i = \text{S}\)
  • Observed + (estimated) number of unobserved species
    • \(\text{Chao1} = \text{Observed} + \hat{c}_0\)

Note \(c_i\) the number of species observed \(i\) times (\(i = 1, 2, \dots\)) and \(p_s\) the proportion of species \(s\) (\(s = 1, \dots, S\)).

\(\alpha\)-diversity - Shannon

  • Shannon
    • \(\text{H} = - \sum_{s} p_s \log\left( p_s \right)\)

Note \(p_s = n_{s}/N\) the proportion of species \(s\) (\(s = 1, \dots, S\)), \(n_{s}\) number of species \(s\) et \(N\) total number of species.

Take into account the relative abundance of each taxon \(p_s\)

  • H = 0 when the sample contains only one specie.
  • H increases when the number of different species increases. In other words, the diversity is high.
  • H = log(S) is maximal when each species is equally representated.

\(\alpha\)-diversity - Eveness

  • Simpson diversity (D)
    • \(\text{D} = p_1^2 + \dots + p_S^2\)
  • inverse Simpson, inverse probability that two sequences sampled at random come from the same species
    • \(\text{InvSimpson} = \frac{1}{p_1^2 + \dots + p_S^2} \leq S\)

Take into account the relative abundance of each taxon \(p_s\)

Capture how dominated a community is by its most abundant taxa, asking “What is the probability that two randomly chosen sequences belong to the same taxon?”

Interpretation: - If one specie dominates completely : D = 1 and 1/D = 1 (minimal value) - 1/D increases with diversity (as would Shannon, richness and others)

\(\alpha\)-diversity - illustration

Even Uneven
Observed 15 15
Shannon 2.71 2.06
invSimpson 15 5.45

Low Shannon and invSimpson, communities are dominated by a few abundant taxa.

\(\alpha\)-diversity - filtering

Many \(\alpha\) diversities (Observed, Chao) depend a lot on rare ASV. Do not trim rare ASV before computing them as it can drastically alter the result.

Practice

Consider working with rarefied data

  • Produce \(\alpha\) diversity table
  • Explore \(\alpha\) diversities

\(\alpha\) diversity - Test (1/2)

Perform an analysis of variance (ANOVA) to test the impact of some covariates in the experimental design (in the sample_data table).

Test the null hypothesis \(H_O\), there is no difference between the biological conditions (groups) versus the mean is different between groups.

Assumptions

  • \(\epsilon \sim_{iid} \mathcal{N}(0,\,\sigma^{2})\)
  • Gauss law, independence, heteroscedasticity

\(\alpha\) diversity - Test (2/2)

  • Reject \(H_0\) when \(Pr(F > f) \leq \alpha\) (significant level, 5% usually)

  • To understand group differences in ANOVA, conduct post hoc tests also called “multiple comparison analysis” tests.

    • Easy16S: Tukey’s Honest Significant Differences

Practice

Test and interpret

  • The seasonal effect on \(\alpha\) diversity
  • The AOP effect on \(\alpha\) diversity
  • The seasonal effect on \(\alpha\) diversity inside a fixed AOP (subset)

\(\alpha\) diversity - rarefaction

best approach to control for uneven sequencing effort

Schloss PD. 2024. Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses. mSphere 9:e00354-23. https://doi.org/10.1128/msphere.00354-23

\(\beta\) diversity

  • \(\beta\)-diversity: diversity between samples/communities
    • Which communities are most similar/dissimilar accross samples?

Jaccard index

measures the fraction of species specific to either A or B

\({\text{Jaccard}} = \frac{\sum_{s} 1_{\{n^A_s > 0, n^B_s = 0 \}} + 1_{\{n^B_s > 0, n^A_s = 0 \}}}{\sum_{s} 1_{\{n^A_s + n^B_s > 0 \}}}\)

Note \(n^A_s\) the count of species \(s\) (\(s = 1, \dots, S\)) in community \(A\) and \(n^B_s\) the count in community \(B\). We focus on shared taxa.

A simple example

Bacteria Sample A Sample B
🌹 50 10
🌻 10 50
🌸 20 5
🌼 5 20
Total 85 85

Result:

\[ \frac{0}{4 + 4} = 0 \]

Jaccard index

measures the fraction of species specific to either A or B

\({\text{Jaccard}} = \frac{\sum_{s} 1_{\{n^A_s > 0, n^B_s = 0 \}} + 1_{\{n^B_s > 0, n^A_s = 0 \}}}{\sum_{s} 1_{\{n^A_s + n^B_s > 0 \}}}\)

Note \(n^A_s\) the count of species \(s\) (\(s = 1, \dots, S\)) in community \(A\) and \(n^B_s\) the count in community \(B\).

Focus on shared taxa.

  • Jaccard = 0 when all taxa are shared
  • Jaccard = 1 when all taxa are specific

Bray Curtis distance

The Bray Curtis distance mixes which species are present in each sample and how abundant they are.

\(\displaystyle {\text{BC}} = \sum_{s} |n^A_s - n^B_s| / \sum_{s} |n^A_s + n^B_s|\)

Note \(n^A_s\) the count of species \(s\) (\(s = 1, \dots, S\)) in community \(A\) and \(n^B_s\) the count in community \(B\).

A simple example

Bacteria Sample A Sample B
🌹 50 10
🌻 10 50
🌸 20 5
🌼 5 20
Total 85 85

Computation Details

Bacteria Diff. |A - B|
🌹 |50 - 10| = 40
🌻 |10 - 50| = 40
🌸 |20 - 5| = 15
🌼 |5 - 20| = 15
Total 110

Result:

\[ \frac{110}{85 + 85} = 0.647 \]

Bray Curtis distance

The Bray Curtis distance mixes which species are present in each sample and how abundant they are.

\(\displaystyle {\text{BC}} = \sum_{s} |n^A_s - n^B_s| / \sum_{s} |n^A_s + n^B_s|\)

Note \(n^A_s\) the count of species \(s\) (\(s = 1, \dots, S\)) in community \(A\) and \(n^B_s\) the count in community \(B\).

Focus on shared taxa

  • Bray Curtis distance = 0 when all abundances are shared
  • Bray Curtis distance = 1 when all abundances are specific

Unifrac index

measures the proportion of the length of the phylogenetic tree specific to either a sample or the other.

\({\text{UF}} = \frac{\sum_{e} l_e \left[ 1_{\{p_e > 0, q_e = 0 \}} + 1_{\{q_e > 0, p_e = 0 \}} \right] }{\sum_{e} l_e \times 1_{\{p_e + q_e > 0 \}}}\)

For each branch \(e\), note \(l_e\) its length and \(p_e\) (resp. \(q_e\)) the fraction of community \(A\) (resp. community \(B\)) below branch \(e\).

Unifrac index

    1. Tree representing phylogenetically similar communities, where a significant fraction of the branch length in the tree is shared (gray).
    1. Tree representing two communities that are maximally different so that 100% of the branch length is unique to either the circle or square environment.

Lozupone C, Knight R2005.UniFrac: a New Phylogenetic Method for Comparing Microbial Communities. Appl Environ Microbiol71:.https://doi.org/10.1128/AEM.71.12.8228-8235.2005

Unifrac index

measures the proportion of the length of the phylogenetic tree specific to either a sample or the other.

\(d_{\text{UF}} = \frac{\sum_{e} l_e \left[ 1_{\{p_e > 0, q_e = 0 \}} + 1_{\{q_e > 0, p_e = 0 \}} \right] }{\sum_{e} l_e \times 1_{\{p_e + q_e > 0 \}}}\)

For each branch \(e\), note \(l_e\) its length and \(p_e\) (resp. \(q_e\)) the fraction of community \(A\) (resp. community \(B\)) below branch \(e\).

Focus on shared taxa

  • Unifrac = 0 when all tree branches are shared (abundances can vary)
  • Unifrac = 1 when all tree branches are specific

Weighted-Unifrac index

measures the proportion of the length of the phylogenetic tree specific to a sample, weighted by the abundance differences.

\({\text{wUF}} = \frac{\sum_{e} l_e | p_e - q_e| }{\sum_{e} l_e (p_e + q_e)}\)

For each branch \(e\), note \(l_e\) its length and \(p_e\) (resp. \(q_e\)) the fraction of community \(A\) (resp. community \(B\)) below branch \(e\).

Focus on shared taxa

  • Weighted-Unifrac = 0 when all ASV are shared with the same abundances
  • Weighted-Unifrac = 1 when all ASV are specific and have no tree branch in common

Caracteristics of \(\beta\) diversity (1/2)

Qualitative Quantitative
No phylogeny Jaccard Bray-Curtis
With phylogeny Unifrac Weigthed-Unifrac
  • Jaccard lower than Bray-Curtis ⇒ abondant taxa are not shared
  • Jaccard higher than Unifrac ⇒ communities’ taxa are distinct but phylogenetically related
  • Unifrac higher than weighted Unifrac ⇒ abondant taxa in both communities are phylogenetically close.

Caracteristics of \(\beta\) diversity (2/2)

In general, qualitative diversities are most sensitive to factors that affect presence/absence of organisms (such as pH, salinity, depth, etc) and therefore useful to study and define bioregions (regions with little of no flow between them)…

… whereas quantitative distances focus on factors that affect relative changes (seasonal changes, nutrient availability, concentration of oxygen, depth, etc) and therefore useful to monitor communities over time or along an environmental gradient.

\(\beta\) diversity - rarefaction

best approach to control for uneven sequencing effort

Schloss PD. 2024. Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses. mSphere 9:e00354-23. https://doi.org/10.1128/msphere.00354-23

Practice

Consider working with rarefied data

  • Produce \(\beta\) diversity table
  • Draw \(\beta\) diversity heatmaps

Visualisation MDS or PCoA plots

Multi-Dimensional Scaling (MDS), also named Principal Coordinate Analysis (PCoA)

Aim: find a lower dimensional representation of the data such that the most information is retained.

  • Construct a matrix of pairwise dissimilarities D between each pair of samples (Jaccard, Bray-Curtis dissimilarity, UniFrac distance…).
  • Find the first principal coordinates (PCs) that best preserve the original distribution of distances.

MDS or PCoA plots

PCoA focuses on inter-sample dissimilarities

  • Each point corresponds to a separate sample
  • The coordinates of each point (sample) are determined by the value of the PCs (projection)
  • % of variance explained on each axis.

Advantages MDS or PCoA plots

  • Based on Non-Euclidean Metrics: Bray-Curtis, Jaccard, UniFrac, Aitchison distance…
  • Each PCoA axis is accompanied by the proportion of explained variance, which indicates the extent to which a reduced-dimension projection reflects the original data.
  • Compare communities samples/across environmental factors to gain deeper insights into biodiversity patterns.
  • The influence of these factors will be assessed using a Permutational Multivariate ANOVA (PERMANOVA).

Practice

  • Create PCoA plots by varying the dissimilarity matrix.
  • What environmental factors structure the dataset?
  • What percentage of the variance is explained?

\(\beta\) diversity partitioning

Test the differences in the community composition of communities (beta diversity) from different groups using a distance matrix,

with the assumption of homogeneous dispersions.

PERMANOVA (1/2)

Permutational Multivariate ANOVA (PERMANOVA) compares the variation between groups to the variation within groups.

Three partitions of the variation

  • Total variation (total sum of squares, SST), SST = SSB + SSW
  • Variation within groups (sum of squares within groups, SSW)
  • Variation between groups (sum of squares between groups, SSB)

PERMANOVA (2/2)

  • The significance of the pseudo- F statistic (on the between/within- groups ratio) is evaluated by simulating the null distribution from permutations, with to the design.
  • p-values are computed using these permutations.

Limitations

  • determining the most suitable distance matrix remains challenging
  • it requires permutation to establish its significance, which can be computationally expensive

Practice

  • Based on the experimental design (sample_data, metadata), formulate biological questions and write the corresponding statistical model.
  • Which dissimilarity matrices are most suitable for community composition of communities?
  • Which factors impact \(\beta\) diversity?

Differential abundance analysis

Some slides were adapted from RNASeq formation, module 16 and 23’s Migale formation supports

Differential abundance analysis (DAA)

Aim: Identify ASV (or any taxonomic rank) with differential abundance between biological conditions ?

Challenges of microbiome data to be addressed in DAA:

  • Different sequencing depth
  • Compositionality
  • Sparsity and zero inflation

There are many methods, and this remains an active area of research. There is currently no consensus.

Principles of DESeq2 (1/2)

DESeq2 is based on negative binomial generalized linear model and implemented in Easy16S to perform DAA.

The model is defined as follows:

\(K_{ij} \sim\) NB(mean = \(\mu_{ij}\), dispersion = \(\phi_i\))

\(\log \mu_{ij} = x_j^T \beta_i + \log s_{ij}\)

where \(K_{ij}\) is the count for ASV i in sample j, \(\phi_i\) is is the ASV-specific dispersion, \(s_{ij}\) effective library size (e.g. sequencing depth), \(\mathbf{X}=\left[x_{j}\right]\) is the design matrix and \(\beta_i\) is the vector of coefficients.

A Generalized Linear Model (GLM) allows to decompose the effects on the mean of different factors and their interactions.

Principles of DESeq2 (2/2)

Differential abundance analysis is performed taxon-by-taxon with replicates, using directly raw counts.

Normalization is included in the model. Adapted to compare samples with similar community compositions.

Assumptions:

  • the majority of taxa is invariant between conditions, comparing samples with similar community compositions.

Practice

  • Based on the experimental design (sample_data, metadata), formulate biological questions and write the corresponding statistical model.
  • Which ASV are differentially abondant?

Multiple testing

Some slides were adapted from RNASeq formation, module 16 and 23’s Migale formation supports

Concept

False positive (FP) (type I error: \(\alpha\)) : A non differentially adundant (DA) taxon which is declared DA.

For all ‘taxa (ASV)’, we test \(H_0\) = {taxon \(i\) is not DA} vs \(H_1\) = {the taxon is DA} using a statistical test (calcul of a score)

Let assume all the \(N\) taxa are not DA. Each test is realized at \(\alpha\) level

Ex: \(N=10 000\) taxa and \(\alpha= 0.05\) \(\rightarrow\) \(\mathbb{E}(FP)=500\) taxa.

The Family Wise Error Rate (FWER)

Probability of having at least one Type I error (false positive), of declaring DA at least one non DA taxon.

\[FWER = \mathbb{P}(FP \ge 1)\]

The Bonferroni procedure

  • Either each test is realized at \(\alpha=\alpha^*/N\) level
  • or use of adjusted pvalue \(pBonf_g=min(1,p_g*N)\) and FWER \(\leq\alpha^*\).

For \(N=2 000\), FWER \(\leq\alpha^*=0.05\), \(\alpha= 2.5 10^{-5}\).

Easy but conservative and not powerful.

When the number of tests increases, the FWER \(\rightarrow\) 1 with constant FP.

The False Discovery Rate (FDR)

Idea: Do not control the error rate but the proportion of error

\(\Rightarrow\) less conservative than control of the FWER.

The false discovery rate of FDR’s Benjamini-Hochberg is the expected proportion of Type I errors among the rejected hypotheses

\(FDR = \mathbb{E}(FP / P)\) if \(P > 0\) and \(0\) if \(P=0\)

Principle: The number of declared positive elements \(P\) is given by the greater rank \(i\)

\[p_{(i)} \leq i \alpha^* / N\] with N the total number of taxa (ASV).

FDR \(\leq\) FWER

Experimental design

Some slides were adapted from RNASeq formation, module 16 and 23’s Migale formation supports

Experimental Design

A good design is a list of experiments to conduct in order to answer to the asked question which maximize collected information and minimize experiments cost with respect to constraints.

  • Well define the biological question, get together and collect a priori knowledge (e.g. reference genome, splicing…),
  • Anticipate, Identify all factors of variation and adapt Fisher’s principles (1935), collect metadata from experiment and sequencing,
  • Include independent biological replicates to ensure reproducibility and accuracy of results

Summary

A microbiome analysis involves…

References

1. Liu Y-X, Qin Y, Chen T, Lu M, Qian X, Guo X, et al. A practical guide to amplicon and metagenomic analysis of microbiome data. Protein & Cell. 2020;12:315–30. doi:10.1007/s13238-020-00724-8.
2. McMurdie PJ, Holmes S. Phyloseq: An r package for reproducible interactive analysis and graphics of microbiome census data. PloS one. 2013;8:e61217.
3. Midoux C, Rué O, Chapleur O, Bize A, Loux V, Mariadassou M. Easy16S: A user-friendly shiny web-service for exploration and visualization of microbiome data. Journal of Open Source Software. 2024;9:6704. doi:10.21105/joss.06704.
4. Mariadassou M. Phyloseq-extended: Various customs functions written to enhance the base functions of phyloseq. 2018. https://github.com/mahendra-mariadassou/phyloseq-extended.
5. Irlinger F, Mariadassou M, Dugat-Bony E, Rué O, Neuvéglise C, Renault P, et al. A comprehensive, large-scale analysis of “terroir” cheese and milk microbiota reveals profiles strongly shaped by both geographical and human factors. ISME Communications. 2024;4. doi:10.1093/ismeco/ycae095.
6. Bernard M, Rué O, Mariadassou M, Pascal G. FROGS: a powerful tool to analyse the diversity of fungi with special management of internal transcribed spacers. Briefings in Bioinformatics. 2021;22. doi:10.1093/bib/bbab318.