-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Painter: Memory optimizations #50
Conversation
Currently draft: I need to address the fact that the tunability of the FBA isn't what it says on the tin. Trying to figure out a good way to supply the comptime value, looks like struct fields that are comptime are functionally equivalent to constants, and can't be changed that way (i.e., we can't shuttle one from the context that way). |
ba3ab00
to
905463d
Compare
Also TODO: compositing on to the mask surface for non-fully-opaque alpha (I’ve planned this too, just wanted this up to start). |
905463d
to
caaae94
Compare
Edge cache size paramterization is fixed, but unfortunately the way you can tweak it is a bit obscure (calling painter methods directly off of a new permutation of the type function). However, this is in line with what we are eventually working on in #44; although I have not shared full details, the intention is to have the context be a managed API over a number of unmanaged components. As such, if you needed a custom value for the edge cache, you would just opt into unmanaged behavior. Some things needed to be shuffled around to preserve some of the error checking, and will be refactored soon (really just a result of moving the path checking fully down into the painter). Also, the edge x-coordinate limit has been updated to be i30, this allows us to pack that and the direction flag into a u32, ensuring that our default of 4096 bytes equates to 1k items versus 512. This limit is fine, and I might end up decreasing it anyway if we go to fixed point as described in #26 (in that case, with 24.8, we can just make it i24). |
caaae94
to
b2e255c
Compare
This adds a number of memory optimizations to the painter: * Downsampling is now done in-place when using AA. Memory is freed after the downsample. This avoids an extra allocation for the downsampled surface. * Additionally in AA, if our pixel source is alpha8, we fast-path to just using the mask as the foreground, doing a composition with the pixel source if it's not opaque. This avoids another allocation under this scenario! * Finally, we have moved the edge calculations to a FixedBufferAllocator. The default allocation size (which can be tuned if you're okay calling the painter functions directly from the type function) is 1024 items, or 4096 bytes under our new Polygon.Edge packed struct. This has allowed us to drop the use of external allocators completely from non-AA paint. As part of this, we've dropped the X-coordinate in an edge to an i30 (this might be dropped lower as well, depending on the result of our internal numerics work). * We've also dropped the ArenaAllocator from AA paint as it's not bringing value anymore, and probably just getting in the way of our resize. All of these are in advance of fully addressing #44 - lowering the allocation footprint of the painter will make the path to infallibility much easier.
b2e255c
to
4f869d4
Compare
Just appending some quick benchmarks using poop below to this issue. This tested The slight increase in wall time can likely be accounted for with the addition of the extra tests (2 more were added for this PR). Test suite (main, no update)
Test suite (pre-memory fixes, no update)
|
Also did some optimized tests of the individual alpha channel tests. The improvement in peak RSS is still only around 18%, far off from what I would have expected based on the savings in allocations (1 less 1x alpha8 surface and then a full-size RGBA surface, which would be basically 4x too; this is where the 55% estimate comes from since ultimately we end up only needing need the initial 4x RAM for the initial AA raster of the mask, which is then downsampled in place). This is, of course, a very opaque test; I feel like we'd need instrumentation to test further. Might update the CHANGELOG entry to adjust expectations. Note that this is still also a test binary,
|
Just to summarize what our estimates would be based on our allocation savings (for sake of the CHANGELOG): Before this change, we were allocating the following surfaces:
So estimated before, given a 100x100 polygon would be:
Now, when our source is non-alpha8 (so RGB or RGBA), we downsample in place, removing the 1x mask allocation:
Additionally, we just composite to our mask itself if our source is alpha8, removing the relatively costly RGBA allocation altogether.
So this is where the estimates are ultimately derived from. As seen from the very opaque benchmarks above though, YMMV. Note that in addition to the test code, there's other stuff going on here that could skew things like the fact that these tests still do both AA and non-AA tests, and file comparisons (so files are loaded to diff against existing buffers). Eventually it would be nice to get some instrumentation into the library to test this stuff further, but the change in allocation profile is pretty straightforward. |
This adds a number of memory optimizations to the painter:
Downsampling is now done in-place when using AA. Memory is freed after the downsample. This avoids an extra allocation for the downsampled surface.
Additionally in AA, if our pixel source is alpha8, we fast-path to just using the mask as the foreground, doing a composition with the pixel source if it's not opaque. This avoids another allocation under this scenario!
Finally, we have moved the edge calculations to a FixedBufferAllocator. The default allocation size (which can be tuned if you're okay calling the painter functions directly from the type function) is 1024 items, or 4096 bytes under our new Polygon.Edge packed struct. This has allowed us to drop the use of external allocators completely from non-AA paint. As part of this, we've dropped the X-coordinate in an edge to an i30 (this might be dropped lower as well, depending on the result of our internal numerics work)
We've also dropped the ArenaAllocator from AA paint as it's not bringing value anymore, and probably just getting in the way of our resize.
All of these are in advance of fully addressing #44 - lowering the allocation footprint of the painter will make the path to infallibility much easier.