Standardization and automation of analysis projects

from Migale to a shared solution

Olivier Rué

March 30, 2026

Introduction

Migale team

The Migale Bioinformatics facility proposes different types of services and resources to its users:

  • Access to computing and storage resources
  • Access to training modules, tutorials, and user support
  • Access to informatics and bioinformatics development service
  • Access to data analysis service

Quality approach on Migale platform

Continuous improvement via the Deming Cycle: Plan, Do, Check, Act—learn from results, improve, and repeat.

Note

Migale have continuously been certifying its services since 2011!

How to integrate a quality approach into a data analysis process?

1. What is the lifecycle of a data analysis project?

  • 3 periods in our lifecycle project

  • Preparation: between request and collection of all usefull data
  • Analysis: with a mutually agreed-upon time limit
  • Post-analysis: no time limit (beyond our control)

2. How to handle shifting requirements during data analysis?

  • Too many “never-ending” projects, incompatible with a structured service offer
  • No problem starting a new project from a previous one

3. How to manage communication with requesters?

  • Giving our users real-time visibility into the work in progress
    • Living analysis report
  • Facilitates communication and enables faster, easier decision-making
  • A need for an interactive, engaging interface that supports a wide range of file types

Computational documents

4. How to prepare for service scaling? (multi-partner collaboration)

  • A centralized workspace: accessible to all team members, secure, and available 24/7
    • Git Repositories: The ideal solution for secure, version-controlled collaboration
  • Resource centralization (bibliography, documentation, etc.)
  • Template system to streamline writing and code reuse
  • Need for project tracking and monitoring

Data analysis procedure

  1. The requester fills the webform
  1. The request is reviewed and material and human resources are allocated
  1. A meeting allows to clarify the expectations and to fix
    • The deliverables
    • The deadline
    • Project start upon receipt of all necessary data

Data analysis procedure

  1. The report is shared and filled out as work progresses
  1. A meeting is organised to review the results and decide on the next steps for the project
    • Fill the survey to close the project
    • Ask for a new project if new analyses are required
  1. Help and valorisation time
    • Figures / Mat&Meths for publication
    • Data submission

Posit::Quarto

An open-source scientific and technical publishing system

  • One file (.qmd), multiple formats
    • From a single source (Markdown): HTML, PDF, Word, slides, or websites.
  • Polyglot and interoperable
    • Native support for R, Python, Bash, Julia, and more.
  • Seamless Git integration
    • Versionable, trackable, and collaborative text files.
  • A modern and future-proof standard
    • The successor to R Markdown, backed by a large community and designed for long-term scientific projects.

(Virtually) No limits

  • Manage citations (bibtex…), cross-references, figures and tables numbering
  • Layouts (TOC, tabsets…)

1️⃣

2️⃣

  • Equations, images, graphics, emojis… ✔️ ☺️

(Virtually) No limits

  • Reuse sections via include
  • Create diagrams
```{mermaid}
flowchart LR
    A{node 1}-- is -->B(node 2)
    A-- has -->C[node 3]
```

flowchart LR
    A{node 1}-- is -->B(node 2)
    A-- has -->C[node 3]

Example of report

Preview

CI/CD

Quarto to build reports…

…hosted on a Gitlab repository

…and easily published on the web!



Data management

Communication

  • Statycript safely encrypts and password protects the content of the public static HTML file
  • The HTML report is (until now) always accessible
  • Important information is stored

Service scaling

Service scaling

Metrics

Service monitoring

Project monitoring

Analysis and feedbacks

  • Successful scale-up
  • Excellent feedback from auditors
  • Outstanding feedback from collaborators

Analysis and feedbacks

  • Team buy-in for the overall approach (though less for the technical implementation)
  • Requests to share and mentor on this organizational model

From migale to a shared solution

Projects metadata ⚠️

“Manage, publish, and track analysis projects in one place”

  • Quarto repository
  • Core entities and relationships
  • Python implementation for automating manual file entry

“Manage, publish, and track analysis projects in one place”

Core entities and relationships

  • Project & Events: The Project acts as the central hub. It follows a composition model (a “part-of” relationship) with Events. This means a project is physically composed of a chronological timeline. These events (meetings, status changes, deliveries) are tied exclusively to their parent project, allowing for precise progress tracking.

Core entities and relationships

  • The Analysts (Authors): A project is analyzed by one or more Authors. These are the bioinformaticians or biostatisticians responsible for the work.
  • The Requester (Contact): Each project is requested by a specific Contact. This entity represents the “client” or partner who initiates the analysis request, bridging the gap between external needs and internal resources.

Core entities and relationships

  • Affiliations: To maintain consistency, both Authors and Contacts are linked to Affiliations. This ensures that institutional information (Institute name, address, city) is centralized and reused across the entire database, preventing redundant or conflicting entries.

Customizable configuration

## PROJECT #############
states:
  - In progress
  - To come
  - Closed
  - Rejected

thematics:
  - Metagenomics
  - Metabarcoding
  - Metatransciptomics
  - Text-Mining
  - Transcriptomics
  - Genomics
  - Genome assembly

types:
  - Collaboration
  - Support

## CONTACT #############
positions:
  - Researcher
  - Engineer
  - Technician
  - PhD
  - Student

status:
  - Private
  - Public


## EVENT ###############
events:
  defined_types:
    - Initial request
    - Meeting
    - Project start
    - Initial deadline
    - Email
    - Deadline extension
    - Data backup
    - Website launch
    - Manuscript submission
    - Satisfaction survey
    - Project end
    
  mandatory:
    - Initial request
    - Initial deadline
    - Project start
    - Project end
    - Satisfaction survey
  
event_details:
  Meeting:
    state_options:
      - Pending
      - Done

  Email:
    state_options:
      - Sent
      - Recieved

Implementation

Installation

  1. Clone the repository
  2. Install dependencies
    • quarto (>=1.8.25)
    • python (>=3.10.14)
      • pyyaml (pip install pyyaml)
  3. Use python scripts to fill your projects

Usage

  • CLI
python scripts/manage_affiliations.py inrae-migale \
  --name inrae-migale \
  --institute INRAE \
  --address "Domaine de Vilvert" \
  --city "Jouy-en-Josas" \
  --country "France"
  • Interactive mode

Automatic creation

Easy build (and deploy)

  • quarto render
  • git add/commit/push

One config file allows CI/CD pipeline through Gitlab Pages

pages:
  script:
    - mkdir .public
    - quarto render
    - mv .public public
  artifacts:
    paths:
      - public
  rules:
    - if: $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH

Conclusion

Conclusion

  • Deployment of an operational project management system
    • Accessible, standardized, and centralized
  • Suitable for individuals or teams
  • Addressing a genuine community need
    • FAIR principles…

Perspectives

  • Use ontologies (EDAME, bio-tools…) to add information
  • Communication across the Institute and to the wider community
  • Enhancing features
  • Incorporating external feedback
  • Link to ticketing system
  • Link to emergent solutions for data management (madbot: Metadata And Data Brokering Online Tool)

Thank you for your attention