curated column subsets? #52

7yl4r · 2022-08-04T21:34:50Z

Working with occurrence data I feel overwhelmed by the number of columns.

Would it be a good idea to allow for easy subsetting of columns?

Here is some code in which I have done that:

# === select a subset of columns for improved table legibility :
shortened_df = df[[
    # time-related columns:
    "date_year", "endDayOfYear", "verbatimEventDate", "startDayOfYear", "dateIdentified", "eventTime", "date_mid",
    "eventDate", "month", "date_start", "date_end", "day", "year",
    # row identifier columns:
    "recordNumber", "ownerInstitutionCode", "parentEventID", "identifiedBy", "eventID", "collectionID", "organismID", "recordedBy", "datasetID", "category", "datasetName",
    "institutionCode", "occurrenceID", "collectionCode", "dataset_id", "id", "modified", "catalogNumber", "fieldNumber",
    "institutionID",
    # additional remarks:
    "occurrenceRemarks", "taxonRemarks", "eventRemarks", "samplingProtocol",
    "typeStatus", "preparations", "establishmentMeans", "dynamicProperties", "type",
    # occurrence specifics:
    "individualCount", "occurrenceStatus", "originalScientificName", "absence",
    "terrestrial", "basisOfRecord", "dropped",
]]

We could include arrays of curated column lists so that a user could do something like:

df = df[pyobis.column_subset.taxonomic + pyobis.column_subset.temporal]

To drop everything but the curated list of "taxonomic" and "temporal" columns. Thoughts?

ayushanand18 · 2022-08-05T04:10:17Z

I have got some doubts on this:

how would we decide upon what particular columns to keep on these lists.
I do not know what particular columns do researchers query the most, so that if we include them it would help them a lot
I think there might be a possibility when a researcher doesn't know about this thing (when he/she is new to pyobis and won't lookup documentation in much detail but at a glance. Because when I do something I see the docs at a glance, do some experimentation to understand it by self and then get back to docs.)
- they will need to find which columns are present in the which curated list and that might add up to efforts
but this idea is really great which might ease efforts and make it more easier and friendly to the less code-friendly person.

7yl4r added enhancement discussion needed open discussion. your input is needed! labels Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

curated column subsets? #52

curated column subsets? #52

7yl4r commented Aug 4, 2022

ayushanand18 commented Aug 5, 2022

curated column subsets? #52

curated column subsets? #52

Comments

7yl4r commented Aug 4, 2022

ayushanand18 commented Aug 5, 2022