Increase speed for KNN, added MI and Entropy examples for multidimensional variables #54

chrisferreyra13 · 2024-07-18T17:23:15Z

Thanks for contributing a pull request!

Please be aware that we are a loose team of volunteers so patience is
necessary. Assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.

Again, thanks for contributing!

Reference issue

None, new examples.

What does this implement/fix?

This PR has two new example scripts in which a comparison between MI/Entropy estimators is shown using gaussian variables with multiple dimensions.
I also change minsize to 1 in InfoTot estimator to estimate pairwise mutual information between all x features and the target.
I increased computational speed by almost 5x for entropy_knn and mi_knn.

Additional information

Kernel estimator should be added to MI example after following fix.

…f mutual information with 1 dim

chrisferreyra13 · 2024-07-31T10:00:55Z

Hey sorry, github automatically included my new change for entropy/mi knn in this PR. Let me know if I remove it.

EtienneCmb

Hey @chrisferreyra13,

Thanks for the PR. I made several comments, most of them involve minor changes.

Best,

EtienneCmb · 2024-07-31T11:52:23Z

.vscode/launch.json

Please remove this file. You can add it to the .gitignore .vscode/*

Sorry! I didn't realize it was included.

EtienneCmb · 2024-07-31T11:55:02Z

hoi/core/entropies.py

@chrisferreyra13 I guess it's faster because you use broadcasting rules. Can you check that it doesn't consume to much RAM when using a larger number of nodes? Same question for the mi

Also, you've checked that both implementations give the exact same results?

Yes exactly. The motivation originated because for >5k samples, the method started to become really slow.

I try to measure peak memory consumption with tracemalloc pkg using get_traced_memory function. I don't trust my results hahaha 😅 because for me the new method should consume more memory for sure! Do you know another way to measure memory? I attach the corresponding plots.
I am following this approach (similarly for total info)

for nf in n_features: x = np.random.rand(n_samples, nf) # inject redundancy between nodes (0, 1, 2) x[:, 1] += x[:, 0] x[:, 2] += x[:, 0] model = TC(x) tracemalloc.start() model.fit(method="knn_old", minsize=3, maxsize=nf) mem_usg_knn_old.append(tracemalloc.get_traced_memory()[1]/1024**2) tracemalloc.stop()

I also measure if they give the same results. They do, but when I check by eye the values the last decimals are not the same. I guess this is only a matter of numerical precision of the functions used. I attach plots.

If the method is heavy in practice, we can leave an option for slow/fast method. Although to do I expect people using HOI in a cluster for real analysis, where RAM should be ok.

Let me know!

EtienneCmb · 2024-08-30T11:52:47Z

Thank you @chrisferreyra13, indeed, it's surprising that the new method doesn't use more memory. I merged your modifications. In case we/you/someone find memory issues in the future, we can go back to this PR. Best,

chrisferreyra13 added 9 commits April 8, 2024 15:30

changed minsize to 1 in InfoTot for pairwise MI

11b88dc

fixing np.log(2) to jnp

582d5a8

Merge branch 'main' of github.com:brainets/hoi

81e3d19

fixed and improved mi example

1a99861

added example of high-dimensional mutual information, fixed example o…

fb2bda2

…f mutual information with 1 dim

added example of estimating entropy for high dimensional variables

862c512

Merge branch 'main' of github.com:brainets/hoi

b9febcf

added examples for mi and entropy of multidim vars

0b8ae61

improved entropy_knn and mi_knn

d597834

chrisferreyra13 changed the title ~~MI and Entropy examples for multidimensional variables~~ Increase speed for KNN, added MI and Entropy examples for multidimensional variables Jul 31, 2024

EtienneCmb requested changes Jul 31, 2024

View reviewed changes

chrisferreyra13 added 4 commits July 31, 2024 15:07

removed vscode folder

17ed050

added vscode to gitignore

4a810f0

improving new ent knn

dd05885

improving new mi knn

3e03382

EtienneCmb merged commit 4a4aab3 into brainets:main Aug 30, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase speed for KNN, added MI and Entropy examples for multidimensional variables #54

Increase speed for KNN, added MI and Entropy examples for multidimensional variables #54

chrisferreyra13 commented Jul 18, 2024 •

edited

Loading

chrisferreyra13 commented Jul 31, 2024

EtienneCmb left a comment

EtienneCmb Jul 31, 2024

chrisferreyra13 Aug 4, 2024

EtienneCmb Jul 31, 2024

EtienneCmb Jul 31, 2024

chrisferreyra13 Aug 4, 2024

EtienneCmb commented Aug 30, 2024

Increase speed for KNN, added MI and Entropy examples for multidimensional variables #54

Increase speed for KNN, added MI and Entropy examples for multidimensional variables #54

Conversation

chrisferreyra13 commented Jul 18, 2024 • edited Loading

Reference issue

What does this implement/fix?

Additional information

chrisferreyra13 commented Jul 31, 2024

EtienneCmb left a comment

Choose a reason for hiding this comment

EtienneCmb Jul 31, 2024

Choose a reason for hiding this comment

chrisferreyra13 Aug 4, 2024

Choose a reason for hiding this comment

EtienneCmb Jul 31, 2024

Choose a reason for hiding this comment

EtienneCmb Jul 31, 2024

Choose a reason for hiding this comment

chrisferreyra13 Aug 4, 2024

Choose a reason for hiding this comment

EtienneCmb commented Aug 30, 2024

chrisferreyra13 commented Jul 18, 2024 •

edited

Loading