post berkeley-schema-fy24 merge issue: review `FileTypeEnum` composition and correlation with other `DataObject` slots/relationships #2186

turbomam · 2024-09-24T17:28:57Z

Can a DataObject that is the output of any process use any FileTypeEnum in it's data_object_type slot?
do the different permissible values come from different axes of differentiation?
should we use an is_a hierarchy within the PVs?
should we re-normalize all of the permissible values to lower_snake_case (would require a corresponding data migration and changes to code that would describe future DataObjects)

As one example: what are the advantages and disadvantages of generality or specificity in

Add Reference Calibration File permissible value to FileTypeEnum berkeley-schema-fy24#256

The same question might apply to other PVs in this enumeration.

low priority for now (in my opinion)

cc @mslarae13 @brynnz22

see also the following label (although we might want to remove it at some point)

https://github.com/microbiomedata/nmdc-schema/issues?q=+label%3AFileTypeEnum

for example, we could use a link like this, instead of a lable (berkeley-schema-fy24 in this case)

https://github.com/microbiomedata/berkeley-schema-fy24/pulls?q=is%3Apr+filetypeenum

The text was updated successfully, but these errors were encountered:

turbomam · 2024-09-24T17:30:10Z

Claude finds these different axes of differentiation or concerns in FileTypeEnum:

Data Type / Analysis Method:
- Metagenome data
- Metabolomics data (FT ICR-MS, GC-MS)
- Metaproteomics data
- Assembly data
- Annotation data (various types)
- Read-based analysis
- Taxonomic classification (GOTTCHA2, Kraken2, Centrifuge)
Processing Stage:
- Raw data
- Filtered data
- Error-corrected data
- Assembled data
- Annotated data
File Format:
- FASTQ
- BAM
- FASTA
- GFF
- JSON
- TSV
- PDF
- HTML
Sequencing Read Type:
- Raw Read 1 (forward)
- Raw Read 2 (reverse)
- Interleaved paired-end
Quality Control Stage:
- QC Statistics
- QC non-rRNA reads
Biological Entity Focus:
- Protein-related
- Peptide-related
- RNA-related (rRNA, tRNA, etc.)
- Gene-related
Output Type:
- Report files
- Statistical files
- Plot files (heatmap, barplot, Krona plot)
- Binning results
Annotation Type:
- Structural annotation
- Functional annotation
- Various specific annotation types (e.g., TIGRFam, CRT, Genemark, etc.)
Compression Status:
- Compressed files (e.g., zip files for bins)
- Uncompressed files
Workflow Stage:
- Intermediate files
- Final output files
- Workflow statistics

mslarae13 · 2024-11-01T18:48:42Z

@turbomam is this accomplished with the merged PR?

turbomam self-assigned this Sep 24, 2024

turbomam mentioned this issue Sep 24, 2024

Add Reference Calibration File permissible value to FileTypeEnum microbiomedata/berkeley-schema-fy24#256

Merged

17 tasks

turbomam added the FileTypeEnum composition or usage of FileTypeEnum, which fills a DataObject's data_object_type slot label Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

post berkeley-schema-fy24 merge issue: review `FileTypeEnum` composition and correlation with other `DataObject` slots/relationships #2186

post berkeley-schema-fy24 merge issue: review `FileTypeEnum` composition and correlation with other `DataObject` slots/relationships #2186

turbomam commented Sep 24, 2024 •

edited

Loading

turbomam commented Sep 24, 2024

mslarae13 commented Nov 1, 2024

post berkeley-schema-fy24 merge issue: review FileTypeEnum composition and correlation with other DataObject slots/relationships #2186

post berkeley-schema-fy24 merge issue: review FileTypeEnum composition and correlation with other DataObject slots/relationships #2186

Comments

turbomam commented Sep 24, 2024 • edited Loading

turbomam commented Sep 24, 2024

mslarae13 commented Nov 1, 2024

post berkeley-schema-fy24 merge issue: review `FileTypeEnum` composition and correlation with other `DataObject` slots/relationships #2186

post berkeley-schema-fy24 merge issue: review `FileTypeEnum` composition and correlation with other `DataObject` slots/relationships #2186

turbomam commented Sep 24, 2024 •

edited

Loading