Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal: internal numerics (double, single, fixed?) #26

Open
vancluever opened this issue Apr 12, 2024 · 3 comments
Open

internal: internal numerics (double, single, fixed?) #26

vancluever opened this issue Apr 12, 2024 · 3 comments
Labels
enhancement New feature or request
Milestone

Comments

@vancluever
Copy link
Owner

vancluever commented Apr 12, 2024

NOTE you'll want to go down to the bottom if you're curious about the current status of this work.

From #11

This tracks any efforts towards getting the internal plumbing to fixed-point arithmetic, spun off from the other units discussion.

Currently our main floating-point unit is f64 (the value that we accept on Path for co-ordinates). The case for moving internal calculations to fixed-point would be accuracy and speed (more stable rounding, possibly optimized calculations).

https://github.com/ziglibs/zigfp would probably be what I'd want to look at first before implementing our own stack.

@vancluever vancluever added this to the v0.2.0 milestone Apr 12, 2024
@vancluever vancluever removed this from the v0.2.0 milestone Aug 29, 2024
@vancluever vancluever changed the title internal: fixed-point tracking issue internal: internal numerics (double, single, fixed?) Aug 29, 2024
@vancluever
Copy link
Owner Author

vancluever commented Aug 29, 2024

Removing milestone as I think this is a longer-term question.

As the library matures I think we are becoming entrenched in not necessarily using fixed-point. I'm left wondering if we need them at all, ultimately. Other tools (e.g., tiny-skia in Rust land, and Skia itself, (i.e., the C++ version) use f32, for example. Cairo uses double for all of its API calls and a number of internal arithmetic (converting from fixed-point in a number of cases). TinyVG's SDK renderer uses f32.

Currently, as it stands, any strong cases for any particular format are still hypothetical. I still think speed and/or accuracy are the major drivers, but I think we could do with some more concrete data.

@vancluever vancluever added the enhancement New feature or request label Aug 29, 2024
@vancluever
Copy link
Owner Author

Just wanted to add some stuff back-referenced from #49: if we do end up doing fixed-point, we likely want to use 24.8, this is consistent with Cairo, and they have a well-documented reason for this change in their changelog (link). TL;DR: 16.16 was consistent for X (e.g., X11) drawables historically, but the needs of Cairo to scale past the 16-bit integer representation became pretty obvious (not surprising, given the relatively small value this represents). Meanwhile, having 16 fractional bits proved to be excessive.

The 24.8 representation has existed in Cairo since this point (so around 2008), so it seems to have worked out well.

@vancluever
Copy link
Owner Author

Just logging some notes and testing here:

Main takeaway: likely abandoning fixed point as a possibility, favoring f32, want more data before we migrate. However, I would like to have this concluded soon (so milestone is being added again).

  • An experiment in refactoring the codebase to use fixed point was... painful. Lots of math operations needing to be replaced with function/method calls, awkward conversions between fixed and floating point were necessary (Cairo does this too, for the record), etc. In the end, I abandoned the effort after the acceptance tests completely broke - it's likely that with further debugging I could have gotten things working, but intuitively, it wasn't great. There are also additional unanswered questions with regards to whether or not it would actually help performance. Another thing I noticed in the Cairo codebase that gave me pause was the use of fixed-point magic number conversion; a neat trick, but just another "hack" that we'd need to be cognizant of for implementation and I honestly don't really want the library to be full of hacks.

  • f32 on the other hand was a much easier migration, literally just a s/f64/f32/ in the entire codebase. No build issues, just update unit and acceptance tests (due to some slight changes in rounding) and go. My main concern in doing this immediately is just due to the lack of real good performance data (see benchmarks).

The main case for f32 right now as such is more memory optimization as we move forward on efforts like #44 and #50; I plan on migrating more of the rendering processes to FixedBufferAllocators and using f32 means we can store twice as many points in the same amount of RAM. This was the main motivator for this round of prototyping/benchmarking. I was hoping that fixed point would allow us to keep external API as f64; given that's likely not the case anymore, I'd like this done sooner rather than later to help stabilize API expectations. BUT before this work starts I want better benchmarking so I can actually see the apparent benefits which we're not seeing right now by just running the test suite binaries wholesale.

Benchmarks (f32/f64)

I ran the acceptance test suite using both the default Debug optimization target and ReleaseFast as well. Unfortunately, the results are... inconclusive. In fact, under Debug, f64 is actually faster! Rest of the results are split, or maybe within the margin of error, f32 is favored on peak RSS (slightly) and cache (mainly misses), f64 is slightly favored on cycle count. All in all it seems like a wash right now; we really need instrumentation in order to get better data here.

Debug

f32

Benchmark 1 (3 runs): .zig-cache/o/f8b8ed9afd9ea249b7134d4e6abf6c03/spec
  measurement          mean ± σ            min … max           outliers
  wall_time          5.36s  ± 10.6ms    5.35s  … 5.37s           0 ( 0%)
  peak_rss           17.2MB ±  173KB    17.1MB … 17.4MB          0 ( 0%)
  cpu_cycles         24.1G  ± 56.9M     24.0G  … 24.1G           0 ( 0%)
  instructions       58.6G  ± 8.01K     58.6G  … 58.6G           0 ( 0%)
  cache_references   78.9M  ± 1.24M     77.5M  … 80.0M           0 ( 0%)
  cache_misses       5.77M  ±  218K     5.52M  … 5.92M           0 ( 0%)
  branch_misses      7.70M  ± 89.6K     7.60M  … 7.75M           0 ( 0%)

f64

Benchmark 1 (3 runs): .zig-cache/o/46511a723d08b58f37a9430c8d91dc98/spec
  measurement          mean ± σ            min … max           outliers
  wall_time          4.99s  ± 29.1ms    4.96s  … 5.02s           0 ( 0%)
  peak_rss           17.5MB ±  179KB    17.3MB … 17.6MB          0 ( 0%)
  cpu_cycles         22.4G  ±  122M     22.2G  … 22.5G           0 ( 0%)
  instructions       58.6G  ± 34.7K     58.6G  … 58.6G           0 ( 0%)
  cache_references   81.2M  ±  753K     80.6M  … 82.1M           0 ( 0%)
  cache_misses       6.62M  ±  305K     6.29M  … 6.89M           0 ( 0%)
  branch_misses      7.84M  ± 46.2K     7.79M  … 7.89M           0 ( 0%)

ReleaseFast

f32

Benchmark 1 (19 runs): .zig-cache/o/203f90727e660a1bb5c7d822faffe623/spec
  measurement          mean ± σ            min … max           outliers
  wall_time           544ms ± 6.61ms     533ms …  553ms          0 ( 0%)
  peak_rss           11.7MB ±  157KB    11.3MB … 11.8MB          0 ( 0%)
  cpu_cycles         2.09G  ± 24.2M     2.05G  … 2.12G           0 ( 0%)
  instructions       6.24G  ± 2.63K     6.24G  … 6.24G           0 ( 0%)
  cache_references   28.7M  ±  421K     27.8M  … 29.3M           0 ( 0%)
  cache_misses       2.83M  ±  138K     2.57M  … 3.10M           0 ( 0%)
  branch_misses      5.13M  ± 14.2K     5.11M  … 5.16M           0 ( 0%)

f64

Benchmark 1 (19 runs): .zig-cache/o/446ade7ef86a99f9f0a8d13693da8fde/spec
  measurement          mean ± σ            min … max           outliers
  wall_time           550ms ± 12.5ms     531ms …  579ms          0 ( 0%)
  peak_rss           11.8MB ±  154KB    11.6MB … 12.2MB          0 ( 0%)
  cpu_cycles         2.08G  ± 16.5M     2.06G  … 2.12G           0 ( 0%)
  instructions       6.24G  ± 3.12K     6.24G  … 6.24G           0 ( 0%)
  cache_references   29.7M  ±  486K     28.8M  … 30.5M           0 ( 0%)
  cache_misses       3.54M  ±  210K     3.26M  … 4.11M           1 ( 5%)
  branch_misses      5.14M  ± 19.3K     5.12M  … 5.18M           0 ( 0%)

@vancluever vancluever added this to the v0.5.0 milestone Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant