-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use error bound filter in isInCircle predicate instead of long double. #1162
base: main
Are you sure you want to change the base?
Conversation
I don't have a voronoi test yet, but my performance suite shows no differences, with one exception: the prepared point-in-poly test seems to be consistently about 10% faster. |
@pramsey Thanks for the feedback. Interesting, point-in-poly may benefit from a faster orientation but I would have expected that to be negligible if its implemented with some ray counting then I'd expect runtime to be dominated by walking over segments that don't intersect the ray and need no orientation check (at least for polygons with many segments). I will try to look into that when I find time. I would expect the change in the orientation test to be measurable in e.g. convex hull or triangulation without Delaunay requirement and synthetic benchmarks (I ran perf_orientation from the benchmark directory and compared the number of Iterations since the ns timing seems too coarse for such a micro benchmark, I think running it in a larger algorithm is a more meaningful benchmark for this) and the change in the incircle test to be measurable in benchmarks like Delaunay Triangulation or Voronoi diagram (I ran perf_voronoi from the benchmark directory in the geos repository with release config for the numbers in the opening post). The incircle test maybe very significantly on non-x86-architectures where no machine hardware accelerated 80-bit floats are available and long double would compile to function calls to some soft float implementation, e.g. https://godbolt.org/z/EosY5ehjq . |
I hooked a Delaunay build into my test harness but couldn't measure any different between this PR and 3.13. |
I tested as follows using the Delaunay/Voronoi benchmarks in the benchmark folder from the main repo:
I honestly wish I knew, where the outliers come from with turbo disabled, maybe the P-core/E-core (it's a mobile i7-12800H) thing can affect it despite fixing the CPU core with taskset. Same steps with CXX=/usr/bin/clang++ before the cmake command yield for me:
Similar results for the other workloads. And on an EPYC-Milan I don't have these outlier timings, so I post just the first sample for each workload:
|
Thanks for the nice writeup of your perf_* results. I did the same tests on my MacBook Pro and I'm afraid I see no consistent advantage for this particular optimization.
|
Would it make sense to gate this behind something like
|
@pramsey Thanks for getting back to me on this. The test results are interesting. I do not have a mac to test. But using clangs target option, I find that it seems to default to compile long double to 8 byte, so same as double: echo "int s = sizeof(long double);" | clang++ -x c++ -S - -O3 -target aarch64-apple-darwin -o - | grep '.long'
.long 8 ; 0x8 So my patch would not benefit the circle predicate performance there from switching down to double. Instead it will cost a little performance due to the slightly more expensive error bound computation (visible in your benchmark, seems to be ~1-2%). On the other hand, for architectures that are already compiling the current code to 64-bit precision/double, this also means that the current code will be a lot less robust. If I generate 100k degenerate cases, the current code will produce an opposing result (interior when the truth is exterior or the other way around) in 8 cases with 80-bit precision but in ~14k cases with 64-bit precision. All code for the test is in this gist: https://gist.github.com/tinko92/86126244ed04b65aeb09c9223ec4c37c#file-test-sh (note that this test code will exclusively produce degenerate cases where the test point is near the circle boundary up to roughly machine precision). The patched code will never confuse interior and exterior, regardless of availability of 80-bit precision doubles but always produce the correct sign or BOUNDARY. If you run that gist code on your Macbook, I would expect that you see many results in which the 2nd and 3rd column show opposite signs. So overall my opinion would be to not gate it. The architectures that lack 80-bit long doubles suffer a 2% performance hit but will benefit from it a lot robustness-wise (with an understanding of robustness where "boundary" is a safe result while confusing exterior and interior is bad) and the architectures that have 80-bit long doubles should benefit from it somewhat performance-wise and still a little robustness-wise (guarantee of not confusing exterior and interior). This understanding of robustness is based on the idea that confusing interior and exterior could lead to infinite loops in triangle flipping for Delaunay Triangulation and should be avoided for this reason. But I don't know the specifics of geos' Delaunay Triangulation algorithm or the general motivation behind having an incircle-predicate with increased robustness in geos in the first place. Edit: Discussion of orientation, now in #1184, removed. |
It would be better if this was split into 2 PRs, one for each algorithm. This would separate the different discussions, tests, and changes. |
That makes sense, since most of the discussion is around the incircle predicate, I'll split out the orientation predicate into a separate PR. Edit: If there is interest in making this method robust as the name suggests, I can add a commit with a slightly transformed version of Shewchuk's public domain implementation, where all defines are changed to constexpr functions. https://gist.github.com/tinko92/8229b4d90ed0dddc410ec22ce5d75b2f . This would guarantee correct predicate results for every input that does not cause over or underflow (so anything with coordinates whose absolute values are roughly in the [1e-75,1e75] plus {-0., 0.} range. It would only be executed in case of filter failure and therefore not contribute to runtime for non-degenerate inputs. Edit2: After reading more from the docs, I understand that I should probably prose that to JTS first, this may take some time, though to port to Java. |
4143db7
to
9f04586
Compare
This PR is edited to separate out the change to the orientation filter, now in #1184 .
For the incircle predicate, this PR proposes to not generally use long double (which increases precision a little but is not robust as the code notes) but to use a similar error bound filter instead (bound computation based on https://doi.org/10.1007/s10543-023-00975-x , I don't know the source for alternative expression of the incircle determinant, I have first seen it in CGAL). The rational for that is higher performance (roughly 4-10% speedup on my machine for perf_voronoi) and that a more robust computation here is probably motivated by preventing infinite loops in Delaunay Triangulation due to false cases of INTERIOR, which should be impossible with this version (except possibly for inputs near the lower end of the representable magnitude of double that produce denormalized results/underflow).
PS: I also have an implementation of exact predicates that I could port to geos to be run after failures of the fast filters if it would be considered beneficial to the library by its maintainers. I don't know enough about its use cases to make a guess about that. Exact predicates guarantee consistency wrt to predicate calls. They would not guarantee consistency between predicates and computed geometries (e.g. predicate might say some polygons overlap but the computed intersection may then be empty after rounding to double coordinates in the output geometry, even if all intermediate computations were exact).