Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curated column subsets? #52

Open
7yl4r opened this issue Aug 4, 2022 · 1 comment
Open

curated column subsets? #52

7yl4r opened this issue Aug 4, 2022 · 1 comment
Labels
discussion needed open discussion. your input is needed! enhancement

Comments

@7yl4r
Copy link
Collaborator

7yl4r commented Aug 4, 2022

Working with occurrence data I feel overwhelmed by the number of columns.

Would it be a good idea to allow for easy subsetting of columns?

Here is some code in which I have done that:

# === select a subset of columns for improved table legibility :
shortened_df = df[[
    # time-related columns:
    "date_year", "endDayOfYear", "verbatimEventDate", "startDayOfYear", "dateIdentified", "eventTime", "date_mid",
    "eventDate", "month", "date_start", "date_end", "day", "year",
    # row identifier columns:
    "recordNumber", "ownerInstitutionCode", "parentEventID", "identifiedBy", "eventID", "collectionID", "organismID", "recordedBy", "datasetID", "category", "datasetName",
    "institutionCode", "occurrenceID", "collectionCode", "dataset_id", "id", "modified", "catalogNumber", "fieldNumber",
    "institutionID",
    # additional remarks:
    "occurrenceRemarks", "taxonRemarks", "eventRemarks", "samplingProtocol",
    "typeStatus", "preparations", "establishmentMeans", "dynamicProperties", "type",
    # occurrence specifics:
    "individualCount", "occurrenceStatus", "originalScientificName", "absence",
    "terrestrial", "basisOfRecord", "dropped",
]]

We could include arrays of curated column lists so that a user could do something like:

df = df[pyobis.column_subset.taxonomic + pyobis.column_subset.temporal]

To drop everything but the curated list of "taxonomic" and "temporal" columns. Thoughts?

@7yl4r 7yl4r added enhancement discussion needed open discussion. your input is needed! labels Aug 4, 2022
@ayushanand18
Copy link
Collaborator

I have got some doubts on this:

  • how would we decide upon what particular columns to keep on these lists.
  • I do not know what particular columns do researchers query the most, so that if we include them it would help them a lot
  • I think there might be a possibility when a researcher doesn't know about this thing (when he/she is new to pyobis and won't lookup documentation in much detail but at a glance. Because when I do something I see the docs at a glance, do some experimentation to understand it by self and then get back to docs.)
    • they will need to find which columns are present in the which curated list and that might add up to efforts
  • but this idea is really great which might ease efforts and make it more easier and friendly to the less code-friendly person.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed open discussion. your input is needed! enhancement
Projects
None yet
Development

No branches or pull requests

2 participants