You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#2 resulted in (expected) performance losses for normal draw operations (e.g., Path.fill() and Path.stroke()), due to the extra composition operations now involved in performing the draw. Rather than paint pixel-by-pixel (and only those pixels), we now have paint the mask (functionally equivalent to the old, aforementioned, non-composited paint), downsample the mask, and then composite that mask twice, one to a temporary source surface, and then to the canvas/destination surface.
I have some ideas I want to explore to try and help improve performance here:
SIMD Processing
This feels like "low-hanging fruit", but I think there's some nuance in how implementation needs to be done. Rudimentary updates to allow the composition operations to use vectors (see 18df15d) did not yield any performance increases (very naively measured with time running the spec tests), I'm guessing due to the amount of conversions necessary to actually make this happen with our current memory layout in a surface. We don't want to lose our type system here, so I'm wondering if MultiArrayList could help us here - hopefully, this would allow us to parallelize entire strides of pixel memory into SIMD-able operations at the very least at the channel level.
Other Ideas (Memory, etc)
These issues are just off-the-cuff ideas that I've got for what could be other improvements:
Tweaks of where we alloc/de-alloc memory?
Improvements to our super-sampling algorithm?
One thing I did notice - adding an ArenaAllocator to paintCompositeimmediately got us a near 40% performance increase (the figures are possibly higher, given my rudimentary benchmark included all tests, not just the ones using AA). I will be submitting this fix shortly.
The text was updated successfully, but these errors were encountered:
vancluever
changed the title
Compositor-related performance increases
Compositor-related performance improvements
Mar 29, 2024
Pasting some v0.1.0 benchmark results for rendering of 017_stroke_star_round.zig. We are definitely rendering worse than Cairo, but that's to be expected, as there is lots of room in the compositor for optimization. Cairo is also using pixman under the hood, which is purpose-built for this kind of thing (and additionally is SIMD-optimized).
The performance lag being compositor-related is pretty evident by the good performance on the non-AA z2d test (Cairo, as far as I know, builds a mask even when not using AA, whereas we don't - for the time being anyway).
Removing the milestone in favor a longer-term burn on this one. There are other things that I think I want to prioritize first (e.g., missing features particularly in support of some degree of a complete SVG feature set).
#2 resulted in (expected) performance losses for normal draw operations (e.g.,
Path.fill()
andPath.stroke()
), due to the extra composition operations now involved in performing the draw. Rather than paint pixel-by-pixel (and only those pixels), we now have paint the mask (functionally equivalent to the old, aforementioned, non-composited paint), downsample the mask, and then composite that mask twice, one to a temporary source surface, and then to the canvas/destination surface.I have some ideas I want to explore to try and help improve performance here:
SIMD Processing
This feels like "low-hanging fruit", but I think there's some nuance in how implementation needs to be done. Rudimentary updates to allow the composition operations to use vectors (see 18df15d) did not yield any performance increases (very naively measured with
time
running the spec tests), I'm guessing due to the amount of conversions necessary to actually make this happen with our current memory layout in a surface. We don't want to lose our type system here, so I'm wondering ifMultiArrayList
could help us here - hopefully, this would allow us to parallelize entire strides of pixel memory into SIMD-able operations at the very least at the channel level.Other Ideas (Memory, etc)
These issues are just off-the-cuff ideas that I've got for what could be other improvements:
One thing I did notice - adding an
ArenaAllocator
topaintComposite
immediately got us a near 40% performance increase (the figures are possibly higher, given my rudimentary benchmark included all tests, not just the ones using AA). I will be submitting this fix shortly.The text was updated successfully, but these errors were encountered: