Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add option for minimum number of confirmed features #96

Open
MarkPundurs opened this issue May 7, 2021 · 0 comments
Open

ENH: Add option for minimum number of confirmed features #96

MarkPundurs opened this issue May 7, 2021 · 0 comments

Comments

@MarkPundurs
Copy link

MarkPundurs commented May 7, 2021

While running sklearn.model_selection.GridSearchCV on a BorutaPy-based estimator (code below), I got the nonblocking error ValueError: Found array with 0 feature(s) (shape=(694, 0)) while a minimum of 1 is required. (full error message below) In this context, it would be useful to specify that BorutaPy() select some nonzero minimum number of features.

Full error message

Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\pundumx\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "<ipython-input-315-84e5437b8711>", line 19, in fit
    self.estimator_.fit(X_filt, y)
  File "C:\Users\pundumx\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 304, in fit
    accept_sparse="csc", dtype=DTYPE)
  File "C:\Users\pundumx\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\base.py", line 432, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "C:\Users\pundumx\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
    return f(**kwargs)
  File "C:\Users\pundumx\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 802, in check_X_y
    estimator=estimator)
  File "C:\Users\pundumx\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
    return f(**kwargs)
  File "C:\Users\pundumx\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 661, in check_array
    context))
ValueError: Found array with 0 feature(s) (shape=(694, 0)) while a minimum of 1 is required.

Code for grid search

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.base import BaseEstimator, clone
from sklearn.utils.metaestimators import if_delegate_has_method

# estimator class based on example at https://scikit-learn.org/0.23/auto_examples/cluster/plot_inductive_clustering.html
class BorutaPy_Estimator(BaseEstimator):
    def __init__(self, estimator, n_estimators=1000, perc=0):
        self.estimator = estimator
        self.n_estimators = n_estimators
        self.perc = perc

    def fit(self, X, y):
        self.estimator_ = clone(self.estimator)
        self.feat_selector = BorutaPy(self.estimator_, n_estimators=self.n_estimators, random_state=1, perc=self.perc)
        self.feat_selector.fit(X, y)
        X_filt = self.feat_selector.transform(X)
        self.estimator_.fit(X_filt, y)
        return self
    
    @if_delegate_has_method(delegate='estimator_')
    def predict_proba(self, X):
        return self.estimator_.predict_proba(self.feat_selector.transform(X))

    @if_delegate_has_method(delegate='estimator_')
    def predict(self, X):
        return self.estimator_.predict(self.feat_selector.transform(X))

rf = RandomForestClassifier(n_jobs=-1, ccp_alpha=0.000005, max_features=0.05)
feat_selector = BorutaPy_Estimator(rf, n_estimators=1000)
param_grid = {'perc': [100, 90, 80, 70, 60]}
gs = GridSearchCV(feat_selector, param_grid, scoring='accuracy')
gs.fit(X_train, y_train) 
gs.cv_results_['mean_test_score'], gs.best_score_, gs.best_params_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant