-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
formula: add make_deduplication_links_in
#18478
base: master
Are you sure you want to change the base?
Conversation
Particularly for Java dependents that commonly duplicate JARs, e.g. * `prestodb` can be reduced from 2GB to 600MB * `joern` can be reduced from 1.3GB to 500MB Also can be used for PostgreSQL dependents that have same SQL files installed to support multiple `postgresql@X` formulae.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea, looking good so far! Any thoughts on what would be required to do this automatically?
# FIXME: Hardlinks are not fully supported so using `hardlink: true` will only | ||
# reduce the bottle size or source build but will be duplicated on bottle pour. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes them not fully supported, out of interest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My old incomplete PR #13154 is related as BSD cp duplicates hardlinks. Trying to switch this brings its own set of problems due to bugs/limitations in other commands.
An example of bottle pour behavior is trino
which has a bottle/estimated-unpack size of <1GB but will pour to >2GB.
There also needs to be special handling when crossing filesystem boundaries (the most extreme case being someone running on a USB formatted as (ex)FAT which doesn't support hardlinks). Off the top of my head, some that can handle this are rsync -H
and GNU cp --preserve=links
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for context!
I think an opt-in approach like this makes sense. I think ideally handling this with automatic hardlinks in future (given they are supported in default filesystems on both macOS and Linux) and accepting that you'll get duplication for non-hard links seems ideal.
Symlinks are a bit risky to do automatically since they can be processed in different ways (e.g. if the symlink is resolved then it can break functionality. It is one reason for switching bin symlinks to exec scripts). Hardlinks can be safer as they don't have a different behavior on readlink/realpath, but |
Particularly for Java dependents that commonly duplicate JARs, e.g.
prestodb
can be reduced from 2GB to 600MBjoern
can be reduced from 1.3GB to 500MBAlso can be used for PostgreSQL dependents that have same SQL files installed to support multiple
postgresql@X
formulae.brew style
with your changes locally?brew typecheck
with your changes locally?brew tests
with your changes locally?Need to write some tests and locally experiment with.