You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is already a decent synergy between pandas and scikit-learn and most other popular machine learning libraries, as in a pandas DataFrame is almost always accepted as an input data structure.
However, the output of the scikit-learn transformers is a pure numpy array, and thus one loses the column name information of the input data. Preserving the column names through the ML pipeline would be extremely useful to data scientists to optimize/understand/debug data science pipelines.
The text was updated successfully, but these errors were encountered:
I agree. It would also be nice if, when passed Series and DataFrames, scikit-learn ML methods returned Series (or DataFrames when there are multiple labels) with corresponding indexes when predicting.
There is already a decent synergy between
pandas
andscikit-learn
and most other popular machine learning libraries, as in apandas
DataFrame is almost always accepted as an input data structure.However, the output of the
scikit-learn
transformers is a purenumpy
array, and thus one loses the column name information of the input data. Preserving the column names through the ML pipeline would be extremely useful to data scientists to optimize/understand/debug data science pipelines.The text was updated successfully, but these errors were encountered: