Standardization and automation of analysis projects

from Migale to a shared solution

Mahendra Mariadassou
Olivier Rué

June 26, 2026

Introduction

Migale team

The Migale Bioinformatics facility proposes different types of services and resources to its users:

  • Access to computing and storage resources
  • Access to training modules, tutorials, and user support
  • Access to informatics and bioinformatics development service
  • Access to data analysis service

Data analysis specificities

A clear lifecycle of a data analysis project is crucial

  • 3 periods in our lifecycle project

  • Preparation: between request and collection of all useful data
  • Analysis: with a mutually agreed-upon time limit
  • Post-analysis: no time limit (beyond our control)

Agility to handle shifting requirements during data analysis

  • 😩 Too many “never-ending” projects, incompatible with a structured service offer
  • Fix people, deliverables and time to perform the analyses
  • If new analyses needed, no problem starting a new project from a previous one

Effective communication channels and a centralized repository

  • 👀 Giving our users real-time visibility into the work in progress
    • Living analysis report
  • 🤝 Facilitates communication and enables faster, easier decision-making thanks to Computational documents
  • A centralized workspace: accessible to all team members, secure, and available 24/7
    • Git Repositories: The ideal solution for secure, version-controlled collaboration
    • Resource centralization (bibliography, documentation, etc.)
    • Template system to streamline writing and code reuse

Quarto + CI/CD

Quarto to build reports…

…hosted on a Gitlab repository

…and easily published on the web!



Data management

Service scaling

Service scaling

Our project management solution

https://forge.inrae.fr/migale/quarto-templates/data-analysis-reports

“Manage, publish, and track analysis projects in one place”

  • Quarto repository
  • Core entities and relationships
  • Python implementation for automating manual file entry

“Manage, publish, and track analysis projects in one place”

Core entities and relationships

  • Project & Events: The Project acts as the central hub. It follows a composition model (a “part-of” relationship) with Events. This means a project is physically composed of a chronological timeline. These events (meetings, status changes, deliveries) are tied exclusively to their parent project, allowing for precise progress tracking.

Core entities and relationships

  • The Analysts (Authors): A project is analyzed by one or more Authors. These are the bioinformaticians or biostatisticians responsible for the work.
  • The Requester (Contact): Each project is requested by a specific Contact. This entity represents the “client” or partner who initiates the analysis request, bridging the gap between external needs and internal resources.

Core entities and relationships

  • Affiliations: To maintain consistency, both Authors and Contacts are linked to Affiliations. This ensures that institutional information (Institute name, address, city) is centralized and reused across the entire database, preventing redundant or conflicting entries.

Customizable configuration

## PROJECT #############
states:
  - In progress
  - To come
  - Closed
  - Rejected

thematics:
  - Metagenomics
  - Metabarcoding
  - Metatransciptomics
  - Text-Mining
  - Transcriptomics
  - Genomics
  - Genome assembly

types:
  - Collaboration
  - Support

## CONTACT #############
positions:
  - Researcher
  - Engineer
  - Technician
  - PhD
  - Student

status:
  - Private
  - Public


## EVENT ###############
events:
  defined_types:
    - Initial request
    - Meeting
    - Project start
    - Initial deadline
    - Email
    - Deadline extension
    - Data backup
    - Website launch
    - Manuscript submission
    - Satisfaction survey
    - Project end
    
  mandatory:
    - Initial request
    - Initial deadline
    - Project start
    - Project end
    - Satisfaction survey
  
event_details:
  Meeting:
    state_options:
      - Pending
      - Done

  Email:
    state_options:
      - Sent
      - Recieved

Implementation

Installation

  1. Clone/Fork the repository
  2. Install dependencies
    • quarto (>=1.8.25)
    • python (>=3.10.14)
      • pyyaml (pip install pyyaml)
  3. Use python scripts to fill your projects

Usage

  • CLI
python scripts/manage_affiliations.py inrae-migale \
  --name inrae-migale \
  --institute INRAE \
  --address "Domaine de Vilvert" \
  --city "Jouy-en-Josas" \
  --country "France"
  • Interactive mode

Automatic creation

Easy build (and deploy)

  • quarto render
  • git add/commit/push

One config file allows CI/CD pipeline through Gitlab Pages

pages:
  script:
    - mkdir .public
    - quarto render
    - mv .public public
  artifacts:
    paths:
      - public
  rules:
    - if: $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH

Service / Project monitoring

Conclusion

Conclusion

  • Deployment of an operational project management system
    • Accessible, standardized, and centralized
    • ISO-9001 certified
  • Suitable for individuals or teams
  • Addressing a genuine community need
    • FAIR principles

Perspectives

  • Use ontologies (EDAM, bio-tools…) to add information
  • Communication across the Institute and to the wider community
  • Enhancing features
  • Incorporating external feedback
  • Link to ticketing system
  • Link to emergent solutions for data management (madbot: Metadata And Data Brokering Online Tool)

Thank you for your attention

Supplementary slides

Data analysis procedure

  1. The requester fills the webform
  1. The request is reviewed and material and human resources are allocated
  1. A meeting allows to clarify the expectations and to fix
    • The deliverables
    • The deadline
    • Project start upon receipt of all necessary data

Data analysis procedure

  1. The report is shared and filled out as work progresses
  1. A meeting is organised to review the results and decide on the next steps for the project
    • Fill the survey to close the project
    • Ask for a new project if new analyses are required
  1. Help and valorisation time
    • Figures / Mat&Meths for publication
    • Data submission

Posit::Quarto

An open-source scientific and technical publishing system

  • One file (.qmd), multiple formats
    • From a single source (Markdown): HTML, PDF, Word, slides, or websites.
  • Polyglot and interoperable
    • Native support for R, Python, Bash, Julia, and more.
  • Seamless Git integration
    • Versionable, trackable, and collaborative text files.
  • A modern and future-proof standard
    • The successor to R Markdown, backed by a large community and designed for long-term scientific projects.

(Virtually) No limits

  • Manage citations (bibtex…), cross-references, figures and tables numbering
  • Layouts (TOC, tabsets…)

1️⃣

2️⃣

  • Equations, images, graphics, emojis… ✔️ ☺️

(Virtually) No limits

  • Reuse sections via include
  • Create diagrams
```{mermaid}
flowchart LR
    A{node 1}-- is -->B(node 2)
    A-- has -->C[node 3]
```

flowchart LR
    A{node 1}-- is -->B(node 2)
    A-- has -->C[node 3]

Example of report

Preview

Communication

  • Staticrypt safely encrypts and password protects the content of the public static HTML file
  • The HTML report is (until now) always accessible
  • Important information is stored

Metrics

Analysis and feedbacks

  • Successful scale-up
  • Excellent feedback from auditors
  • Outstanding feedback from collaborators

Analysis and feedbacks

  • Team buy-in for the overall approach (though less for the technical implementation)
  • Requests to share and mentor on this organizational model