Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Bioproject ids to Biosample records #79

Open
realmarcin opened this issue Dec 3, 2021 · 0 comments
Open

add Bioproject ids to Biosample records #79

realmarcin opened this issue Dec 3, 2021 · 0 comments

Comments

@realmarcin
Copy link
Collaborator

The Bioproject ids from NCBI will allow to group samples by their parent project, which is useful. With the current fields it may be possible to infer project structure for a subset of samples (eg EMP500) but I think no general solution.

The good news is that the Bioproject xml file is only 1.8G currently:
https://ftp.ncbi.nlm.nih.gov/bioproject/

In theory a few fields from the tsv summary file could solve this -- some overlap in theory with fields (and hopefully values) in the Biosample xml:
https://ftp.ncbi.nlm.nih.gov/bioproject/summary.txt
Organism Name TaxID Project Accession Project ID Project Type Project Data Type

(fields in bold are new contributions from Bioproject xml)

They also have .xsd schemas for the Bioproject data, not sure if that's useful:
https://ftp.ncbi.nlm.nih.gov/bioproject/Schema.v.1.2/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant