add Bioproject ids to Biosample records #79

realmarcin · 2021-12-03T00:14:23Z

The Bioproject ids from NCBI will allow to group samples by their parent project, which is useful. With the current fields it may be possible to infer project structure for a subset of samples (eg EMP500) but I think no general solution.

The good news is that the Bioproject xml file is only 1.8G currently:
https://ftp.ncbi.nlm.nih.gov/bioproject/

In theory a few fields from the tsv summary file could solve this -- some overlap in theory with fields (and hopefully values) in the Biosample xml:
https://ftp.ncbi.nlm.nih.gov/bioproject/summary.txt
Organism Name TaxID Project Accession Project ID Project Type Project Data Type

(fields in bold are new contributions from Bioproject xml)

They also have .xsd schemas for the Bioproject data, not sure if that's useful:
https://ftp.ncbi.nlm.nih.gov/bioproject/Schema.v.1.2/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Bioproject ids to Biosample records #79

add Bioproject ids to Biosample records #79

realmarcin commented Dec 3, 2021

add Bioproject ids to Biosample records #79

add Bioproject ids to Biosample records #79

Comments

realmarcin commented Dec 3, 2021