-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase speed for KNN, added MI and Entropy examples for multidimensional variables #54
Conversation
…f mutual information with 1 dim
Hey sorry, github automatically included my new change for entropy/mi knn in this PR. Let me know if I remove it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @chrisferreyra13,
Thanks for the PR. I made several comments, most of them involve minor changes.
Best,
.vscode/launch.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this file. You can add it to the .gitignore .vscode/*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry! I didn't realize it was included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisferreyra13 I guess it's faster because you use broadcasting rules. Can you check that it doesn't consume to much RAM when using a larger number of nodes? Same question for the mi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you've checked that both implementations give the exact same results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly. The motivation originated because for >5k samples, the method started to become really slow.
I try to measure peak memory consumption with tracemalloc pkg using get_traced_memory function. I don't trust my results hahaha 😅 because for me the new method should consume more memory for sure! Do you know another way to measure memory? I attach the corresponding plots.
I am following this approach (similarly for total info)
for nf in n_features:
x = np.random.rand(n_samples, nf)
# inject redundancy between nodes (0, 1, 2)
x[:, 1] += x[:, 0]
x[:, 2] += x[:, 0]
model = TC(x)
tracemalloc.start()
model.fit(method="knn_old", minsize=3, maxsize=nf)
mem_usg_knn_old.append(tracemalloc.get_traced_memory()[1]/1024**2)
tracemalloc.stop()
I also measure if they give the same results. They do, but when I check by eye the values the last decimals are not the same. I guess this is only a matter of numerical precision of the functions used. I attach plots.
If the method is heavy in practice, we can leave an option for slow/fast method. Although to do I expect people using HOI in a cluster for real analysis, where RAM should be ok.
Let me know!
Thank you @chrisferreyra13, indeed, it's surprising that the new method doesn't use more memory. I merged your modifications. In case we/you/someone find memory issues in the future, we can go back to this PR. Best, |
Thanks for contributing a pull request!
Please be aware that we are a loose team of volunteers so patience is
necessary. Assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.
Again, thanks for contributing!
Reference issue
None, new examples.
What does this implement/fix?
Additional information
Kernel estimator should be added to MI example after following fix.