-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
berkeley-schema-fy24
: make squeaky-clean all
output begins with error message about missing file local/gold-study-ids.yaml
#2177
Comments
@turbomam @eecavanna this hasn't been touched in 2 weeks |
I would like to get some buy-in from @mbthornton-lbl because I designed this buggy feature in support of automating the validation of data that can be retrieved with methods he wrote. I'm not using it, and removing that whole workflow would really slim down the nmdc-schema's Makefile. Update: I intended for this to help with bulk get-study-related-records = "src.scripts.nmdc_database_tools:cli" # todo recheck We could also remove the targets that interact with fuseki as part of this. |
Some paths forward I see:
|
.PHONY: pre-build
pre-build: local/gold-study-ids.yaml create-nmdc-tdb2-from-app
## getting a report of GOLD study identifiers, which might have been used a Study ids in legacy (pre-Napa) data
local/gold-study-ids.json:
curl -X 'GET' \
--output $@ \
'https://api-napa.microbiomedata.org/nmdcschema/study_set?max_page_size=999&projection=id%2Cgold_study_identifiers' \
-H 'accept: application/json'
local/gold-study-ids.yaml: local/gold-study-ids.json
yq -p json -o yaml $< | cat > $@
# can't ever be used without generating local/gold-study-ids.yaml first
STUDY_IDS := $(shell yq '.resources.[].id' local/gold-study-ids.yaml | awk '{printf "%s ", $$0} END {print ""}')
# can't ever be used without generating local/gold-study-ids.yaml first
print-discovered-study-ids:
@echo $(STUDY_IDS)
# Replace colons with hyphens in study IDs
# can't ever be used without generating local/gold-study-ids.yaml first
STUDY_YAML_FILES := $(addsuffix .yaml,$(addprefix local/study-files/,$(subst :,-,$(STUDY_IDS))))
# can't ever be used without generating local/gold-study-ids.yaml first
create-study-yaml-files-from-study-ids-list: $(STUDY_YAML_FILES)
# can't ever be used without generating local/gold-study-ids.yaml first
print-intended-yaml-files: local/gold-study-ids.yaml
@echo $(STUDY_YAML_FILES)
|
PS: API calls with arbitrary, high |
wc -l local/gold-study-ids.yaml
head local/gold-study-ids.yaml
|
make --dry-run create-study-yaml-files-from-study-ids-list mkdir -p local/study-files
study_file_name=`echo local/study-files/nmdc-sty-11-8fb6t785.yaml` ; \
echo $study_file_name ; \
study_id=`poetry run get-study-id-from-filename $study_file_name` ; \
echo $study_id ; \
date ; \
time poetry run get-study-related-records \
--api-base-url https://api-berkeley.microbiomedata.org \
extract-study \
--study-id $study_id \
--output-file local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml
sed -i.bak 's/gold:/GOLD:/' local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml # kludge modify data to match (old!) schema
rm -rf local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.bak
poetry run linkml-validate --schema nmdc_schema/nmdc_materialized_patterns.yaml local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml > local/study-files/nmdc-sty-11-8fb6t785.yaml.validation.log.txt
time poetry run migration-recursion \
--schema-path nmdc_schema/nmdc_materialized_patterns.yaml \
--input-path local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml \
--output-path local/study-files/nmdc-sty-11-8fb6t785.yaml # kludge masks ids that contain whitespace
rm -rf local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml.bak
mkdir -p local/study-files
study_file_name=`echo local/study-files/nmdc-sty-11-33fbta56.yaml` ; \
echo $study_file_name ; \
study_id=`poetry run get-study-id-from-filename $study_file_name` ; \
echo $study_id ; \
date ; \
time poetry run get-study-related-records \
--api-base-url https://api-berkeley.microbiomedata.org \
extract-study \
--study-id $study_id \
--output-file local/study-files/nmdc-sty-11-33fbta56.yaml.tmp.yaml etc. |
study_file_name=`echo local/study-files/nmdc-sty-11-8fb6t785.yaml` ; \
echo $study_file_name ; \
study_id=`poetry run get-study-id-from-filename $study_file_name` ; \
echo $study_id ; \
date ; \
time poetry run get-study-related-records \
--api-base-url https://api-berkeley.microbiomedata.org \
extract-study \
--study-id $study_id \
--output-file local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml
|
nmdc:sty-11-8fb6t785 appears to be a real study: https://api-berkeley.microbiomedata.org/nmdcschema/ids/nmdc%3Asty-11-8fb6t785 but the command above is trying to find Also maybe there really are no In fact, maybe https://api-berkeley.microbiomedata.org/nmdcschema/data_generation_set?max_page_size=1 {
"resources": [
{
"id": "nmdc:omprc-11-0003fm52",
"name": "1000S_WLUP_FTMS_SPE_BTM_1_run2_Fir_22Apr22_300SA_p01_149_1_3506",
"description": "High resolution MS spectra only",
"has_input": [
"nmdc:bsm-11-jht0ty76"
],
"has_output": [
"nmdc:dobj-11-cp4p5602"
],
"processing_institution": "EMSL",
"type": "nmdc:MassSpectrometry",
"analyte_category": "nom",
"associated_studies": [
"nmdc:sty-11-28tm5d36"
],
"instrument_used": [
"nmdc:inst-14-mwrrj632"
]
}
],
"next_page_token": "nmdc:sys0qphf9j29"
} see also https://microbiomedata.github.io/berkeley-schema-fy24/MassSpectrometry/ so if that's still hard-coded into https://github.com/microbiomedata/berkeley-schema-fy24/blob/cd6acbee87b627b439d068b6bfeb8cb002f05d99/src/scripts/nmdc_database_tools.py#L64-L83 then maybe that script should be considered unmaintained? |
In my local clone of the
berkeley-schema-fy24
repo, when I run$ make squeaky-clean all
, the console output begins with an error message:Screenshot:
The text was updated successfully, but these errors were encountered: