Histogram estimator correction #53

Mattehub · 2024-07-08T18:04:01Z

Thanks for contributing a pull request!

Please be aware that we are a loose team of volunteers so patience is
necessary. Assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.

Again, thanks for contributing!

Reference issue

Example: Fixes #1234.

What does this implement/fix?

Explain your changes.

Additional information

Any additional information you think is important.

EtienneCmb · 2024-07-08T18:15:13Z

Ciao @Mattehub. Can you please describe your pull request please?

Mattehub · 2024-07-08T18:26:06Z

Sorry for not describing it @EtienneCmb , I see now that few test are failing. I found that a possible way to estimate entropy by binning a continuous variable is by using the histogram estimator. The way we were doing it before has a bias. This estimator is made in order to avoid that bias. So I implemented the possibility of estimating the entropy using this estimator. Now it is possible to do that automatically by giving in input to the function entropy_bin, the bin_siz. So concretely what changed is:

function entropy_bin allow to give one more input, bin_siz
function digitalize, when use_sklearn = False, give one more output, bin_siz

Important point: do you want I add one more estimator that directly from the continuous data compute the entropy with the histogram estimator? In this case giving "binning", would be the entropy of a discrete variable. With "histogram estimator", it is the entropy of a continuous variable, but with the estimator described above.

PS. On frama I sent some messages about that.

EtienneCmb · 2024-07-08T18:44:45Z

Ok. I propose a few modifications :

the test are failing because the digitize function returns two outputs. If you never use the second one, then I think it's not needed. We can let the function as it was, same for the tests
can you rename for bin_size? It's more explicit

Can you please provide the ressources you used please?

Mattehub · 2024-07-08T21:14:56Z

Thanks for the answer! I changed the test so that it worked. The thing is that the histogram estimator, without changing the functions as I did, can not be implemented easily by the users. In my pull request, the aim is that users, by using together the functions we provide, can perform histogram estimation (continuous data -> binning, bin size -> entropy with correction). Another possibility would be that we give another entropy method, in which all these three steps are done automatically, in thi case we can call it "histogram" estimator. In this case, among the possible methods we have: "binning" for discrete data; "histogram" for continuous data, by discretizing it. what do you think about it?

Here two references in which I took the information: document, and wikipedia.

For bin_size, I completely agree! I am changing it.

Mattehub · 2024-07-17T13:18:09Z

Hi @EtienneCmb, sorry to bother you here. I have a little issue in the changes of the functions. To use digitize within entropy_hist I had to substitute the functions within digitize, from numpy -> jax.numpy. I think this is one of the things that are making the test failing. Before changing more lines, I wanted to make sure that is ok if to have jnp instead of np in the functions within digitize. You can see the details about it in the changes to the file utils.stats

Do you think it is ok?

EtienneCmb · 2024-07-08T18:53:51Z

hoi/utils/stats.py

    if not use_sklearn:
-        return np.apply_along_axis(digitize_1d, axis, x, n_bins)
+        return np.apply_along_axis(digitize_1d, axis, x, n_bins), b_siz


It's a bit weird to return two arguments in one case and only one in the other. It's confusing, could we improve it?

@Mattehub, few points about this function :

Currently, the digitize_1d is applied differently to each dimension of x. Maybe the binning should be applied to the entire matrix. Can you test?

One way to have digitize returning two ouputs is to define an output dict that we can fill with both methods. Something like :

# prepare parameter dict parameters = dict() if ...: # sklearn stuff else:: # manual stuff parameters["bin_size"] = bin_size return x, parameters

EtienneCmb · 2024-07-09T07:01:57Z

examples/it/plot_entropies.py

    "KNN-3": get_entropy("knn", k=3),
+    "Gaussian": get_entropy(method="gauss"),
    "Kernel": get_entropy("kernel"),


Can you add the example using the binning in the plot_entropies example? Something like in the plot_mi example :

def ent_binning(x): x, bin_size = digitize(x) fcn_binning = get_entropy("binning", bin_size=bin_size) return fcn_binning(x)

And then can you paste the plot comparing estimators inside this conversation please?

EtienneCmb · 2024-07-09T07:02:20Z

hoi/core/entropies.py

    n_features, n_samples = x.shape
+    print(n_features, n_samples)


remove print

Matteo NERI and others added 6 commits July 8, 2024 19:40

Correction of the problems in bins estimator

a6ffa5b

Merge branch 'brainets:main' into main

92fd3a8

correction for histogram estimator

b34905b

Merge branch 'main' of github.com:Mattehub/hoi

c33dd1b

reformatting

6e06493

Reformatting

d415a32

Matteo NERI added 2 commits July 8, 2024 20:37

checking for the test

dd1364d

reformatting issues

7f81fc2

Matteo NERI and others added 4 commits July 9, 2024 15:25

changed siz -> size

d60102b

making few changes, still to conclude

02f5cff

some changes, still working in progress

eac8544

reformatting and changing test part

dac611a

Mattehub and others added 7 commits September 3, 2024 11:20

Merge branch 'brainets:main' into main

4ed53ff

final version of entropy hist pull request

c9f56b9

final changes to entropy_hist

f088ce0

correcting few mistakes for tests

77c2432

cleaning and rearranging the pull request

5ed5772

final changes, new version of entropy_hist

5c92159

change rel_entr for entr in hist estimator

c740bff

EtienneCmb approved these changes Sep 3, 2024

View reviewed changes

EtienneCmb merged commit 06e0a15 into brainets:main Sep 3, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Histogram estimator correction #53

Histogram estimator correction #53

Mattehub commented Jul 8, 2024

EtienneCmb commented Jul 8, 2024

Mattehub commented Jul 8, 2024

EtienneCmb commented Jul 8, 2024

Mattehub commented Jul 8, 2024

Mattehub commented Jul 17, 2024 •

edited

Loading

EtienneCmb Jul 8, 2024

EtienneCmb Jul 9, 2024

EtienneCmb Jul 9, 2024

EtienneCmb Jul 9, 2024

Histogram estimator correction #53

Histogram estimator correction #53

Conversation

Mattehub commented Jul 8, 2024

Reference issue

What does this implement/fix?

Additional information

EtienneCmb commented Jul 8, 2024

Mattehub commented Jul 8, 2024

EtienneCmb commented Jul 8, 2024

Mattehub commented Jul 8, 2024

Mattehub commented Jul 17, 2024 • edited Loading

EtienneCmb Jul 8, 2024

Choose a reason for hiding this comment

EtienneCmb Jul 9, 2024

Choose a reason for hiding this comment

EtienneCmb Jul 9, 2024

Choose a reason for hiding this comment

EtienneCmb Jul 9, 2024

Choose a reason for hiding this comment

Mattehub commented Jul 17, 2024 •

edited

Loading