Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandera GeoPandas validation fails when using gpd.array.GeometryDtype #1862

Open
2 of 3 tasks
Jaapel opened this issue Nov 22, 2024 · 0 comments
Open
2 of 3 tasks

Pandera GeoPandas validation fails when using gpd.array.GeometryDtype #1862

Jaapel opened this issue Nov 22, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Jaapel
Copy link

Jaapel commented Nov 22, 2024

Describe the bug
Pandera fails to validate when using gpd.array.GeometryDtype as "geometry" column.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Stack trace:

Traceback (most recent call last):
  File "xxx/test.py", line 7, in <module>
    "geometry": pa.Column(gpd.array.GeometryDtype),
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx/.pixi/envs/default/lib/python3.12/site-packages/pandera/api/pandas/components.py", line 84, in __init__
    super().__init__(
  File "xxx/.pixi/envs/default/lib/python3.12/site-packages/pandera/api/dataframe/components.py", line 69, in __init__
    super().__init__(
  File "xxx/.pixi/envs/default/lib/python3.12/site-packages/pandera/api/base/schema.py", line 41, in __init__
    self.dtype = dtype
    ^^^^^^^^^^
  File "xxx/.pixi/envs/default/lib/python3.12/site-packages/pandera/api/pandas/array.py", line 43, in dtype
    self._dtype = pandas_engine.Engine.dtype(value) if value else None
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx/.pixi/envs/default/lib/python3.12/site-packages/pandera/engines/pandas_engine.py", line 272, in dtype
    return engine.Engine.dtype(cls, np_or_pd_dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx/.pixi/envs/default/lib/python3.12/site-packages/pandera/engines/engine.py", line 271, in dtype
    raise TypeError(
TypeError: Data type 'geometry' not understood by Engine.

Code Sample, a copy-pastable example

import geopandas as gpd
import pandas as pd
import pandera as pa
from shapely.geometry import Polygon

geo_schema = pa.DataFrameSchema({
    "geometry": pa.Column(gpd.array.GeometryDtype),  # fails
   #  "geometry": pa.Column("geometry"),  # does work
    "region": pa.Column(str),
})

geo_df = gpd.GeoDataFrame({
    "geometry": [
        Polygon(((0, 0), (0, 1), (1, 1), (1, 0))),
        Polygon(((0, 0), (0, -1), (-1, -1), (-1, 0)))
    ],
    "region": ["NA", "SA"]
})

geo_schema.validate(geo_df)

Expected behavior

Validation succeeds.

Desktop (please complete the following information):

  • OS: MacOS
  • Browser: firefox
  • Version: 15.1

Additional context

Environment installed via pixi, pandera, pandera-dask, pandera-geopandas via conda-forge

@Jaapel Jaapel added the bug Something isn't working label Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant