Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RandomSearchTuner with automatic search space ignores MaxDepth #98

Open
TonyCongqianWang opened this issue Jun 7, 2024 · 4 comments
Open
Labels
enhancement New feature or request

Comments

@TonyCongqianWang
Copy link

I have a problem. When I use the RandomSearchTuner and fix the Learners num_trees, there is no problem and as expected every trial will have that num_trees, if I fix the max_depth however, it just gets ignored. When I use tuner.choice("max_depth", [1,2,3]), max_depths is respected during trials, but I get the following Error at the end:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../ydf/learner/specialized_learners.py", line 1548, in train
    return super().train(ds, valid)
  File "/.../ydf/learner/generic_learner.py", line 190, in train
    return self._train_from_dataset(ds, valid)
  File "/.../ydf/learner/generic_learner.py", line 241, in _train_from_dataset
    cc_model = learner.Train(**train_args)
ValueError: INVALID_ARGUMENT: The param "max_depth" is defined multiple times.
@achoum
Copy link
Collaborator

achoum commented Jun 12, 2024

Hi Tony,

Sorry to hear about your issue.
Can you share a snippet of the training code to help me figure the failing setup?

Also, make sure you don't have automatic_search_space=True, which will already define some of the hyper-parameter search space.

@TonyCongqianWang
Copy link
Author

Hi,

I did use automatic_search_space=True so that's probably the cause. It would be nice though if you could manually override some choices.

Alternatively do you have some ideas how I can improve the results with max_depths=1 and high dimensional categorical features? So far I only get constant models. I am trying to use your library to replicate viola and jones face detection algorithm

@achoum achoum added the enhancement New feature or request label Jun 19, 2024
@achoum
Copy link
Collaborator

achoum commented Jun 19, 2024

YDF does not count max-depth the same way as other libraries--it is something I regret, but it is too late to change :)
In other words, if you want stumps, you need to have max_depth=2 instead of max_depth=1.

To train a Viola and Jones like model with fixed thresholds, make sure to feed BOOLEAN features instead of CATEGORICAL ones.
BOOLEAN features are essentially a CATEGORICAL features with only two possible values, and they are faster to train.

You could also feed NUMERICAL features and let YDF figure the thresholds.

Note also that the Viola and Jones algorithm is a boosting algorithm (such as AdaBoost) which is a bit different from a gradient boosting algorithm. However, I would expect for gradient boosting to give similar results.
If you get results (good or bad), don't hesitate to share. I would like very interesting.

In this kind of situation where there have many correlated numerical features, oblique forests sometime give excellent results. It would also be interesting to try. For example, this could be:

GradientBoostedTreesLearner(split_axis="SPARSE_OBLIQUE", sparse_oblique_num_projections_exponent=1.5, sparse_oblique_max_num_projections=500, ...)

@TonyCongqianWang
Copy link
Author

Thanks for your reply! That explains a lot. I was confused why my models all turned out to be constant, but it turns out max_depths=1 forces them to be constant! Maybe there should be a warning issued when people set max_depths = 1? Also There should be some hint in the Documentation about this different interpretation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants