Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid excessive heap utilisation due to in memory creation of md5s #222

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

DanielThomas
Copy link

We noticed in a application with > 100K files that we ran into problems while generating the checksums. This writes to a file and streams from that file, to the output stream to avoid heap utilisation during that phase.

@tcurdt
Copy link
Owner

tcurdt commented Dec 4, 2015

Thanks for the contribution! I am a little puzzled though - why (even with 100k files) this was a problem. So I assume 100k files, times (random guess) 100 chars per line - that's 10.000.000 chars. That's probably around 20MB of RAM needed. Is that already what you meant by excessive? How much memory usage did you see? I am just wondering if this really was the problem.

@DanielThomas
Copy link
Author

Going to set a breakpoint and catch the length, and get a heap dump and tell you exactly what the utilisation is. Certainly in the hundreds of megabytes, due to the length of the paths.

@tcurdt
Copy link
Owner

tcurdt commented Dec 4, 2015

Awesome - thanks!

Hundreds of megabytes? That sounds quite fishy.
Actually - maybe you could print out the file size of the temp file?
Or even better provide the file - be it obfuscate (e.g. with a simple tr) ?

@DanielThomas
Copy link
Author

The final md5sums file is 33M. The StringBuilder will retain double that of course, thanks to Java's 2-byte representation of strings:

java.lang.StringBuilder [JNI Global, Stack Local ← checksums, md5s] 75497512

And two more copies again of the same bytes:

  • checksums.toString() @ ControlBuilder:147
  • pContent.getBytes("UTF-8") @ ControlBuilder:212

So a little over 220MB I guess. Background is here incidentally (we've got our fair share of heap issues in our Gradle plugin!):

nebula-plugins/gradle-ospackage-plugin#142

@tcurdt
Copy link
Owner

tcurdt commented Dec 4, 2015

Thanks for digging into this. I guess the two copies are where the problems turns into excessive. I am wondering if we could dial back the crazy by getting rid of those copies. On the other hand in-memory will always hurt scalability.

I am not so eager to use temp files - but on the first look the PR looks reasonable. I need to poke around a bit more but I am inclined to accept it. Thanks for your work!

(I so need to get started on jdeb2)

@tcurdt tcurdt added this to the 1.9 milestone Aug 20, 2019
@tcurdt tcurdt removed this from the 1.9 milestone Jun 5, 2021
Repository owner deleted a comment from MarkChristensen1 Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants