/db/nr/current/fasta/nr.fsa
Databanks
All you have to know about the databanks Migale provide
Databanks
We provide an access to a large set of public biological databanks in different formats (FASTA, genbank, hmm…). They are stored and accessible for all users. Their update is performed and managed with the BioMAJ software package, or manually.
Update system
We use to manage most of the databanks. BioMAJ (BIOlogie Mise A Jour) is a workflow engine dedicated to data synchronization and processing.
The software automates the update cycle and the supervision of the locally mirrored databank repository. Common usages are to download remote databanks (Genbank for example) and apply some transformations (blast indexing, emboss indexing, etc.). Any script can be applied on downloaded data. When all treatments are successfully applied, bank is put in “production” on a dedicated release directory. With cron tasks, update tasks can be executed at regular interval, data are downloaded again only if a change is detected.
Access
Available databanks are stored in a specific directory accessible from front
and all cluster nodes: /db/
.
Each databank is stored in a directory named as the databank name. The arborescence is conform to the remote source of the databank. A current link indicates the last updated databank. For example, if you want to use the last version of nr, the FASTA file is accessible from this path:
Some tools need their own databanks and provide them. You will find them in the /db/outils/
directory.
Available databanks
Managed with BioMaJ
Manually updated
Ask for a databank or an update
For asking a new databank or an update of an existing databank, first make sure the data are not restricted to a particular license and fill the dedicated form.