Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OccResponse in occurrences #171

Open
Ei3-kw opened this issue Dec 9, 2024 · 5 comments · May be fixed by #172
Open

OccResponse in occurrences #171

Ei3-kw opened this issue Dec 9, 2024 · 5 comments · May be fixed by #172

Comments

@Ei3-kw
Copy link

Ei3-kw commented Dec 9, 2024

Hi,

It seems to me that execute() in OccResponse gives out the results data straight away instead of {'total': int, 'results': dict}, which voids the to_pandas() function.
Screenshot 2024-12-10 at 10 27 01

This is inconsistent with provided documentation as well as other classes.

Also is there a way to get all the nodeids

Cheers, Ella

@ayushanand18
Copy link
Collaborator

Hey, thanks for raising this. For users ease execute() method throws a pandas dataframe and sets the data in the 'data' attribute. so you can use object.data to access it later on. The to_pandas method also exists on the OccResponse object and not a DF - so the error popped up.

@ayushanand18
Copy link
Collaborator

on another note, you should ideally create an object using occurrences.search() and do an execute() using object.execute(). This will allow you to play around with the data, do necessary transformations without having to refetch if you do some modifications (fetched result is also stored in as object.data). Everytime you call .search() - it creates a new object, and you might end up frequently refetching data.

@7yl4r
Copy link
Collaborator

7yl4r commented Dec 12, 2024

Hi @Ei3-kw ,

Thank you for opening the issue. Can you provide some full code with comments indicating what you are trying to do?

@Ei3-kw
Copy link
Author

Ei3-kw commented Dec 13, 2024

I am calling to_pandas() on OccResponse object

from pyobis import occurrences

# search up OBIS JP node
df = occurrences.search(nodeid='0d07a0ea-9c75-48e8-b3fd-c28d653f4270', size=69)
fetched = df.execute() # fetch the db
print(f"Columns in the fetched Dataframe: {fetched.columns}")
df.to_pandas()

But fetched data, which's stored as self._data is already the value of results, which caused to_pandas() to fail. Bc it's expecting {'total': int, 'results': dict}.

The columns of fetched data

Columns in the fetched Dataframe: Index(['associatedMedia', 'basisOfRecord', 'catalogNumber', 'class',
       'collectionCode', 'datasetName', 'day', 'decimalLatitude',
       'decimalLongitude', 'endDayOfYear',
       ...
       'county', 'parvphylum', 'gigaclass', 'parvphylumid', 'gigaclassid',
       'maximumDistanceAboveSurfaceInMeters',
       'minimumDistanceAboveSurfaceInMeters', 'subgenus', 'subgenusid',
       'georeferenceRemarks'],
      dtype='object', length=150)

I've got around it by using the fetched data straight away and calling len to get the total. But it is inconsistent with provided documentation and how other classes behave.

Cheers, Ella

@ayushanand18
Copy link
Collaborator

Thanks for the code snippet. I was able to reproduce this.

Sorry for the issue. We probably thought people would be directly using the query.data object to use the dataframe for search queries, but you have a fair point - all library methods must be consistent.

ayushanand18 added a commit to ayushanand18/pyobis that referenced this issue Dec 14, 2024
ayushanand18 added a commit to ayushanand18/pyobis that referenced this issue Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants