You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
validates the input data using sklearn's check_array. This check function does not only check, but also modifies the data. If the input is a pandas DataFrame, the output is a numpy array. This may causes problems in _data_setter, an abstract function implemented by the user, as the user expects a DataFrame back.
Is this done intentionally? I would like to suggest to return data of the same type as the input data.
The text was updated successfully, but these errors were encountered:
It is intentional, but may not be the best way forward. When I worked on ModelBuilder, I was focused on getting it to be able to integrate into a scikit-learn-based Pipeline. It's been a while, but if I remember right, scikit-learn transformers and estimators accept a variety of input types and then "normalize" them to numpy arrays for the computations. So that was the idea of _validate_data. All the base _validate_data does is call check_array or check_X_y from sklearn.utils.validation. I suppose it would be just as easy to call the validation function directly rather than through _validate_data. In my own fork of this code, I actually don't use _validate_data or check_array or check_X_y at all. In my use, the input is always pandas objects rather than numpy arrays. I have to call set_output(transform='pandas') on the transformers in the pipeline, but it works for me.
From comments in some other issues (#196, #246), it seems there is a desire to refactor ModelBuilder to suit a more general use case, and then specialize it for scikit-learn as an additional class/sub-class. Seems like a good idea. There are other API's besides scikit-learn, for example sktime, which also might be useful to work with ModelBuilder-based PYMC models.
When calling
predict_posterior
the linevalidates the input data using sklearn's
check_array
. This check function does not only check, but also modifies the data. If the input is a pandas DataFrame, the output is a numpy array. This may causes problems in_data_setter
, an abstract function implemented by the user, as the user expects a DataFrame back.Is this done intentionally? I would like to suggest to return data of the same type as the input data.
The text was updated successfully, but these errors were encountered: