-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Histogram estimator correction #53
Conversation
Ciao @Mattehub. Can you please describe your pull request please? |
Sorry for not describing it @EtienneCmb , I see now that few test are failing. I found that a possible way to estimate entropy by binning a continuous variable is by using the histogram estimator. The way we were doing it before has a bias. This estimator is made in order to avoid that bias. So I implemented the possibility of estimating the entropy using this estimator. Now it is possible to do that automatically by giving in input to the function entropy_bin, the bin_siz. So concretely what changed is:
Important point: do you want I add one more estimator that directly from the continuous data compute the entropy with the histogram estimator? In this case giving "binning", would be the entropy of a discrete variable. With "histogram estimator", it is the entropy of a continuous variable, but with the estimator described above. PS. On frama I sent some messages about that. |
Ok. I propose a few modifications :
Can you please provide the ressources you used please? |
Thanks for the answer! I changed the test so that it worked. The thing is that the histogram estimator, without changing the functions as I did, can not be implemented easily by the users. In my pull request, the aim is that users, by using together the functions we provide, can perform histogram estimation (continuous data -> binning, bin size -> entropy with correction). Another possibility would be that we give another entropy method, in which all these three steps are done automatically, in thi case we can call it "histogram" estimator. In this case, among the possible methods we have: "binning" for discrete data; "histogram" for continuous data, by discretizing it. what do you think about it? Here two references in which I took the information: document, and wikipedia. For bin_size, I completely agree! I am changing it. |
Hi @EtienneCmb, sorry to bother you here. I have a little issue in the changes of the functions. To use digitize within entropy_hist I had to substitute the functions within digitize, from numpy -> jax.numpy. I think this is one of the things that are making the test failing. Before changing more lines, I wanted to make sure that is ok if to have jnp instead of np in the functions within digitize. You can see the details about it in the changes to the file utils.stats Do you think it is ok? |
hoi/utils/stats.py
Outdated
if not use_sklearn: | ||
return np.apply_along_axis(digitize_1d, axis, x, n_bins) | ||
return np.apply_along_axis(digitize_1d, axis, x, n_bins), b_siz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit weird to return two arguments in one case and only one in the other. It's confusing, could we improve it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mattehub, few points about this function :
- Currently, the
digitize_1d
is applied differently to each dimension ofx
. Maybe the binning should be applied to the entire matrix. Can you test? - One way to have
digitize
returning two ouputs is to define an output dict that we can fill with both methods. Something like :
# prepare parameter dict
parameters = dict()
if ...:
# sklearn stuff
else::
# manual stuff
parameters["bin_size"] = bin_size
return x, parameters
"KNN-3": get_entropy("knn", k=3), | ||
"Gaussian": get_entropy(method="gauss"), | ||
"Kernel": get_entropy("kernel"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the example using the binning in the plot_entropies
example? Something like in the plot_mi
example :
def ent_binning(x):
x, bin_size = digitize(x)
fcn_binning = get_entropy("binning", bin_size=bin_size)
return fcn_binning(x)
And then can you paste the plot comparing estimators inside this conversation please?
hoi/core/entropies.py
Outdated
n_features, n_samples = x.shape | ||
print(n_features, n_samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove print
Thanks for contributing a pull request!
Please be aware that we are a loose team of volunteers so patience is
necessary. Assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.
Again, thanks for contributing!
Reference issue
Example: Fixes #1234.
What does this implement/fix?
Explain your changes.
Additional information
Any additional information you think is important.