BioSample

All major archives have some version of a ‘sample’ object that stores sample related metadata. The NCBI (SRA), DDBJ and CNCB (GSA) databases are limited to storing metadata that gets submitted when submitting a study to their related genome archives.

As well as storing metadata from samples that are submitted as part of an ENA submission, the EBI’s BioSamples database can be used for any form of sample metadata archiving and can be linked to other EBI archives at a later point. It is flexible to store any kind of key value pair and values can be linked to ontologies. Multiple BioSamples can be linked together, for example, a virus sample could be linked to a sample from its host through a ‘Derived from’ relationship. Samples may also be linked under a project by specifying a ‘project’ key.

Samples can be linked to EGA projects if sample metadata provided is openly accessible (e.g. https://www.ebi.ac.uk/biosamples/samples/SAMEA4940335 ).

Metadata can be submitted and queried via their REST API. There is a python wrapper though it is not actively maintained.

Can be used to provide a stable identifier to project samples as they are being processed and then linked to the final archival submission e.g. HipSci project example (Streeter et al., 2017), FAANG project example https://www.ebi.ac.uk/biosamples/samples?filter=attr%3Aproject%3AFAANG https://data.faang.org/specimen https://www.faang.org/

References

Streeter, I., Harrison, P. W., Faulconbridge, A., Flicek, P., Parkinson, H., & Clarke, L. (2017). The human-induced pluripotent stem cell initiative—data resources for cellular genetics. Nucleic Acids Research, 45(Database issue), D691–D697. https://doi.org/10.1093/nar/gkw928

Contributors