Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability to process DNADerivedData to occurrences module #98

Open
ayushanand18 opened this issue Sep 30, 2022 · 6 comments
Open

Comments

@ayushanand18
Copy link
Collaborator

Overview

The pyobis package in its current state does not support fetching DNADerivedData using the occurrences module.

I believe there can be two ways to implement it:

  • add it as a parameter in the search function like MeasurementOrFacts
  • create a new function for it say dna( self, **args, **kwargs ) and enable necessary processing and utility methods since search function is already heavy with so many features.
@7yl4r
Copy link
Collaborator

7yl4r commented Sep 30, 2022

DNADerivedData is a Darwin Core extension in the same way MeasurementOrFacts is, so I think they should be treated the same. I agree that occurrences.search is already too heavy.


Let's come at the problem from the other direction and think about the ideal user interface without technical considerations. This is what felt intuitive to me:

occurrences = pyobis.search.occurrences(
    scientificname='Mola mola',
    required_extensions=["MoF","DNA"]
)

However, we have already decided to separate the query-building from the query-executing, and I think that was good choice. Maybe we could aim something like this instead:

occurrences = pyobis.OccQuery(
    scientificname='Mola mola',
    required_extensions=["MoF","DNA"]
).execute()

This is nice because it positions the code to potentially generalize to handle other DwC providers as well:

occurrences = pyobis.OccQuery(
    scientificname='Mola mola',
    required_extensions=["MoF","DNA"]
).execute(
    data_providers=["OBIS", "GBIF"]
)

So, with this in mind, I am thinking we begin adding new features to the OccQuery class rather than into the search function. Assuming we implement DNA in this way and save the other changes for later this will look like:

occurrences = pyobis.OccQuery(
    required_extensions=["DNA"]
).search(
    scientificname='Mola mola',
    MeasurementOrFact=True
)

Thoughts?

@ayushanand18
Copy link
Collaborator Author

I agree that the ways you listed are much intuitive and user-friendly than the current project.

But I had two questions to ask:

  • If we start adding features to the OccQuery class, then there are some methods (almost all except search only) that don't need extensions as parameter.
  • We could not pass parameters directly to OccQuery class since all methods had different parameters. For example, while search and getpoints, etc need scientificname, taxonid, etc. but get function requires occurrence UUID and nothing else. I couldn't find an efficient way to solve this.

@7yl4r
Copy link
Collaborator

7yl4r commented Oct 5, 2022

I think the object-oriented response to this is that we have multiple subclasses of OccQuery:

  • OccGetQuery
  • OccSearchQuery

Both have shared attributes and methods under the parent class OccQuery

  • query execution returns Occurrences
  • shared basic API parameters like baseURL

The object-oriented approach can be a bit pedantic, but it would work here. However, our focus should remain on keeping the intuitiveness of the user-interface first.. Is the object-oriented approach starting to over-complicate the UI?

Maybe a good compromise is to use OO programming under the hood and provide convenience methods to wrap that complexity for casual users. By this I mean little functions like this example:

# this function available as pyobis.occurrences.search()
define search(scientific_name):
    return pyobis.OccSearchQuery(
        scientificname=scientific_name
    ).execute()

@ayushanand18
Copy link
Collaborator Author

ayushanand18 commented Oct 6, 2022

This is interesting, building on your suggestion I think we might need to refactor how we handle functions and classes in each module.


I ran an experiment with pyobis.taxa and found that a better way to implement a functional plus OOPS approach would be to mix them, by letting the user play with functions on abstract level and play with classes and objects in the background.

Sample Usage

# intuitive way of doing things
from pyobis import taxa
query1 = taxa.search(scientificname="mola mola")
query1.execute()
# set query1.data to the fetched data from source, before this we only built the query not the response
query1.data
# show the data
query1.api_url
# show the api url
query1.mapper_url
# show the mapper url
query1.to_pandas()
# convert the data to a pandas dataframe

Under the hood (inside the module)

from ..obisutils import build_api_url, handle_arrstr, obis_baseurl, obis_GET
import pandas as pd

def search(scientificname=None, **kwargs):
    """
    Get taxon records.

    :param scientificname: [String,Array] One or more scientific names from the
        OBIS backbone. All included and synonym taxa are included in the search

    :return: A TaxaResponse object

    """

    scientificname = handle_arrstr(scientificname)
    url = obis_baseurl + "taxon/" + scientificname
    args = {"scientificname": scientificname}
    # return a taxa response class
    
    return TaxaResponse(url, args)

class TaxaResponse():
    """
    Taxa Response Class
    """
    def __init__(self, url, args):
        # public members
        self.data = None
        self.api_url = build_api_url(url, args)
        self.mapper_url = None
        
        # private members
        self.__args = args
        self.__url = url
    
    def execute(self, **kwargs):
        out = obis_GET(
            self.__url,
            self.__args,
            "application/json; charset=utf-8",
            **kwargs
            )
        self.data = out
    
    def to_pandas(self):
        return pd.DataFrame(self.data["results"])

in the obisutils

def build_api_url(url, args):
    return (
        url
        + "?"
        + urlencode({k: v for k, v in args.items() if v is not None})
    )

how does this sound?

@7yl4r
Copy link
Collaborator

7yl4r commented Oct 9, 2022

I think this looks good, but I have a tendency to overengineer things. @ocefpaf (and|or) @MathewBiddle : what do you think? The object-oriented programming approach is familiar to me, and wrapping things with convenience methods for casual users seems reasonable to me too. But maybe there is a more elegant approach?

@ocefpaf
Copy link
Collaborator

ocefpaf commented Oct 11, 2022

I think this looks good, but I have a tendency to overengineer things. @ocefpaf (and|or) @MathewBiddle : what do you think?

Looks good to me. The user interface is not too complex and the internals are easy to follow. +1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants