`berkeley-schema-fy24`: `make squeaky-clean all` output begins with error message about missing file `local/gold-study-ids.yaml` #2177

eecavanna · 2024-09-06T07:04:26Z

In my local clone of the berkeley-schema-fy24 repo, when I run $ make squeaky-clean all, the console output begins with an error message:

$ make squeaky-clean all
Error: open local/gold-study-ids.yaml: no such file or directory
rm -rf project
rm -rf tmp
# ...

Screenshot:

The text was updated successfully, but these errors were encountered:

ssarrafan · 2024-09-20T20:36:01Z

@turbomam @eecavanna this hasn't been touched in 2 weeks
Removing from sprint, adding backlog label

turbomam · 2024-09-24T16:24:50Z

I would like to get some buy-in from @mbthornton-lbl because I designed this buggy feature in support of automating the validation of data that can be retrieved with methods he wrote. I'm not using it, and removing that whole workflow would really slim down the nmdc-schema's Makefile.

Update: I intended for this to help with bulk get-study-related-records operations

get-study-related-records = "src.scripts.nmdc_database_tools:cli" # todo recheck

We could also remove the targets that interact with fuseki as part of this.

eecavanna · 2024-09-25T21:00:52Z

Some paths forward I see:

If the file has moved, update the reference and close issue
If the file is gone, remove the reference and close issue
If the file is obsolete, remove the reference and close issue

turbomam · 2024-09-26T14:53:29Z

.PHONY: pre-build
pre-build: local/gold-study-ids.yaml create-nmdc-tdb2-from-app

## getting a report of GOLD study identifiers, which might have been used a Study ids in legacy (pre-Napa) data
local/gold-study-ids.json:
	curl -X 'GET' \
		--output $@ \
		'https://api-napa.microbiomedata.org/nmdcschema/study_set?max_page_size=999&projection=id%2Cgold_study_identifiers' \
		-H 'accept: application/json'

local/gold-study-ids.yaml: local/gold-study-ids.json
	yq -p json -o yaml $< | cat > $@

# can't ever be used without generating local/gold-study-ids.yaml first
STUDY_IDS := $(shell yq '.resources.[].id' local/gold-study-ids.yaml  | awk '{printf "%s ", $$0} END {print ""}')

# can't ever be used without generating local/gold-study-ids.yaml first
print-discovered-study-ids:
	@echo $(STUDY_IDS)

# Replace colons with hyphens in study IDs
# can't ever be used without generating local/gold-study-ids.yaml first
STUDY_YAML_FILES := $(addsuffix .yaml,$(addprefix local/study-files/,$(subst :,-,$(STUDY_IDS))))

# can't ever be used without generating local/gold-study-ids.yaml first
create-study-yaml-files-from-study-ids-list: $(STUDY_YAML_FILES)

# can't ever be used without generating local/gold-study-ids.yaml first
print-intended-yaml-files: local/gold-study-ids.yaml
	@echo $(STUDY_YAML_FILES)

turbomam · 2024-09-26T15:04:00Z

PS: API calls with arbitrary, high max_page_size are risky

turbomam · 2024-09-26T15:04:48Z

wc -l local/gold-study-ids.yaml

63 local/gold-study-ids.yaml

head local/gold-study-ids.yaml

resources:

id: nmdc:sty-11-8fb6t785
gold_study_identifiers:

gold:Gs0114675

id: nmdc:sty-11-33fbta56
gold_study_identifiers:

gold:Gs0110138

id: nmdc:sty-11-aygzgv51
gold_study_identifiers:

gold:Gs0114663

turbomam · 2024-09-26T15:11:29Z

make --dry-run create-study-yaml-files-from-study-ids-list

mkdir -p local/study-files
study_file_name=`echo local/study-files/nmdc-sty-11-8fb6t785.yaml` ; \
        echo $study_file_name ; \
        study_id=`poetry run get-study-id-from-filename $study_file_name` ; \
        echo $study_id ; \
        date ; \
        time poetry run get-study-related-records \
                --api-base-url https://api-berkeley.microbiomedata.org \
                extract-study \
                --study-id $study_id \
                --output-file local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml
sed -i.bak 's/gold:/GOLD:/' local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml # kludge modify data to match (old!) schema
rm -rf local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.bak
poetry run linkml-validate --schema nmdc_schema/nmdc_materialized_patterns.yaml local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml > local/study-files/nmdc-sty-11-8fb6t785.yaml.validation.log.txt
time poetry run migration-recursion \
        --schema-path nmdc_schema/nmdc_materialized_patterns.yaml \
        --input-path local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml \
        --output-path local/study-files/nmdc-sty-11-8fb6t785.yaml # kludge masks ids that contain whitespace
rm -rf local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml.bak

mkdir -p local/study-files
study_file_name=`echo local/study-files/nmdc-sty-11-33fbta56.yaml` ; \
        echo $study_file_name ; \
        study_id=`poetry run get-study-id-from-filename $study_file_name` ; \
        echo $study_id ; \
        date ; \
        time poetry run get-study-related-records \
                --api-base-url https://api-berkeley.microbiomedata.org \
                extract-study \
                --study-id $study_id \
                --output-file local/study-files/nmdc-sty-11-33fbta56.yaml.tmp.yaml

etc.

turbomam · 2024-09-26T15:11:46Z

study_file_name=`echo local/study-files/nmdc-sty-11-8fb6t785.yaml` ; \
        echo $study_file_name ; \
        study_id=`poetry run get-study-id-from-filename $study_file_name` ; \
        echo $study_id ; \
        date ; \
        time poetry run get-study-related-records \
                --api-base-url https://api-berkeley.microbiomedata.org \
                extract-study \
                --study-id $study_id \
                --output-file local/study-files/nmdc-sty-11-8fb6t785.yaml.tmp.yaml

local/study-files/nmdc-sty-11-8fb6t785.yaml
nmdc:sty-11-8fb6t785
Thu Sep 26 11:10:57 AM EDT 2024
STUDY-ID: nmdc:sty-11-8fb6t785
SCHEMA-VERSION: 11.0.0rc22
Got study nmdc:sty-11-8fb6t785 from the NMDC database.
Got 0 biosamples part_of nmdc:sty-11-8fb6t785.
Traceback (most recent call last):
File "", line 1, in
File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/mark/gitrepos/berkeley-schema-fy24/src/scripts/nmdc_database_tools.py", line 261, in extract_study
raise e
File "/home/mark/gitrepos/berkeley-schema-fy24/src/scripts/nmdc_database_tools.py", line 253, in extract_study
omics_processing_records = api_client.get_omics_processing_records_part_of_study(study_id)
File "/home/mark/gitrepos/berkeley-schema-fy24/src/scripts/nmdc_database_tools.py", line 75, in get_omics_processing_records_part_of_study
response.raise_for_status()
File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://api-berkeley.microbiomedata.org/nmdcschema/omics_processing_set?filter=%7B%22part_of%22%3A+%22nmdc%3Asty-11-8fb6t785%22%7D&max_page_size=1000

real 0m3.892s
user 0m1.407s
sys 0m0.124s

turbomam · 2024-09-26T15:30:44Z

nmdc:sty-11-8fb6t785 appears to be a real study: https://api-berkeley.microbiomedata.org/nmdcschema/ids/nmdc%3Asty-11-8fb6t785

but the command above is trying to find OmicsProcessings that are part of nmdc:sty-11-8fb6t785, and OmicsProcessing has been replaced with DataGeneration subclasses as of berkeley-schema-fy24

Also maybe there really are no DataGeneration subclass instances that are part of that Study?

https://api-berkeley.microbiomedata.org/nmdcschema/data_generation_set?filter=%7B%22part_of%22%3A%22nmdc%3Asty-11-8fb6t785%22%7D&max_page_size=20

In fact, maybe DataGeneration subclass instances can't be part_of anything any more?

https://api-berkeley.microbiomedata.org/nmdcschema/data_generation_set?max_page_size=1

{
  "resources": [
    {
      "id": "nmdc:omprc-11-0003fm52",
      "name": "1000S_WLUP_FTMS_SPE_BTM_1_run2_Fir_22Apr22_300SA_p01_149_1_3506",
      "description": "High resolution MS spectra only",
      "has_input": [
        "nmdc:bsm-11-jht0ty76"
      ],
      "has_output": [
        "nmdc:dobj-11-cp4p5602"
      ],
      "processing_institution": "EMSL",
      "type": "nmdc:MassSpectrometry",
      "analyte_category": "nom",
      "associated_studies": [
        "nmdc:sty-11-28tm5d36"
      ],
      "instrument_used": [
        "nmdc:inst-14-mwrrj632"
      ]
    }
  ],
  "next_page_token": "nmdc:sys0qphf9j29"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`berkeley-schema-fy24`: `make squeaky-clean all` output begins with error message about missing file `local/gold-study-ids.yaml` #2177

`berkeley-schema-fy24`: `make squeaky-clean all` output begins with error message about missing file `local/gold-study-ids.yaml` #2177

eecavanna commented Sep 6, 2024

ssarrafan commented Sep 20, 2024

turbomam commented Sep 24, 2024 •

edited

Loading

eecavanna commented Sep 25, 2024

turbomam commented Sep 26, 2024

turbomam commented Sep 26, 2024 •

edited

Loading

turbomam commented Sep 26, 2024 •

edited

Loading

turbomam commented Sep 26, 2024

turbomam commented Sep 26, 2024 •

edited

Loading

turbomam commented Sep 26, 2024 •

edited

Loading

turbomam commented Sep 26, 2024

berkeley-schema-fy24: make squeaky-clean all output begins with error message about missing file local/gold-study-ids.yaml #2177

berkeley-schema-fy24: make squeaky-clean all output begins with error message about missing file local/gold-study-ids.yaml #2177

Comments

eecavanna commented Sep 6, 2024

ssarrafan commented Sep 20, 2024

turbomam commented Sep 24, 2024 • edited Loading

eecavanna commented Sep 25, 2024

turbomam commented Sep 26, 2024

turbomam commented Sep 26, 2024 • edited Loading

turbomam commented Sep 26, 2024 • edited Loading

turbomam commented Sep 26, 2024

turbomam commented Sep 26, 2024 • edited Loading

turbomam commented Sep 26, 2024 • edited Loading

turbomam commented Sep 26, 2024

`berkeley-schema-fy24`: `make squeaky-clean all` output begins with error message about missing file `local/gold-study-ids.yaml` #2177

`berkeley-schema-fy24`: `make squeaky-clean all` output begins with error message about missing file `local/gold-study-ids.yaml` #2177

turbomam commented Sep 24, 2024 •

edited

Loading

turbomam commented Sep 26, 2024 •

edited

Loading

turbomam commented Sep 26, 2024 •

edited

Loading

turbomam commented Sep 26, 2024 •

edited

Loading

turbomam commented Sep 26, 2024 •

edited

Loading