Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON vs CBOR performance for ASCII text #519

Open
sugmanue opened this issue Nov 15, 2024 · 5 comments
Open

JSON vs CBOR performance for ASCII text #519

sugmanue opened this issue Nov 15, 2024 · 5 comments

Comments

@sugmanue
Copy link

sugmanue commented Nov 15, 2024

Hi there,

We're testing the performance of CBOR vs plain JSON and looks like, at least for ASCII text, JSON is quite faster, this speaks volumes about the Jackson performance of JSON but looks like CBOR still has room for improvement.

The performance tests can be found on this repository. For a simple class with five String fields and ASCII (non-escaped) strings JSON is almost twice as fast as CBOR for larger strings (between 193 and 231 chars)

Benchmark                (flavor)  (size)  Mode  Cnt    Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE   SMALL  avgt    5  266.316 ±  9.938  ns/op
MyBenchmark.json  ASCII_PRINTABLE   SMALL  avgt    5  243.984 ± 13.422  ns/op

Benchmark                (flavor)  (size)  Mode  Cnt    Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE  MEDIUM  avgt    5  725.376 ±  8.700  ns/op
MyBenchmark.json  ASCII_PRINTABLE  MEDIUM  avgt    5  464.803 ± 20.404  ns/op

Benchmark                (flavor)  (size)  Mode  Cnt     Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5  1126.297 ±  7.843  ns/op
MyBenchmark.json  ASCII_PRINTABLE   LARGE  avgt    5   664.466 ± 23.541  ns/op

As expected, this is not the case for multi-byte chars, for instance, chars from the CJK block, emojis or full ASCII (some of which requires escaping in plain JSON). See below

Benchmark         (flavor)  (size)  Mode  Cnt     Score    Error  Units
MyBenchmark.cbor       CJK   LARGE  avgt    5  2076.014 ± 67.569  ns/op
MyBenchmark.json       CJK   LARGE  avgt    5  2939.622 ± 16.501  ns/op

Benchmark         (flavor)  (size)  Mode  Cnt     Score     Error  Units
MyBenchmark.cbor     EMOJI   LARGE  avgt    5  2400.312 ±  11.203  ns/op
MyBenchmark.json     EMOJI   LARGE  avgt    5  8467.852 ± 243.559  ns/op

Benchmark           (flavor)  (size)  Mode  Cnt     Score     Error  Units
MyBenchmark.cbor  FULL_ASCII   LARGE  avgt    5  1106.835 ±  33.094  ns/op
MyBenchmark.json  FULL_ASCII   LARGE  avgt    5  2084.745 ± 104.819  ns/op

Given the prevalence of ASCII text it would be great if the performance could be at least as good but I feel that it should be better.

I played a bit with the loop in tight loop inside _finishShortText (see here) and I see some improvements but not consistent across architectures (better for my M1 laptop, not so much for x86).

I also played a bit with creating the String directly from the input buffer and letting Java take care of UTF8 (see here, missing some other spots and possibly flawed as I was just playing), that approach looks promising (see below) but has a drawback, it won't be able to detect malformed UTF-8 as Jackson does now. I think it can be added behind a feature flag for when we know and trust the source of the data.

Benchmark                (flavor)  (size)  Mode  Cnt    Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5  286.758 ± 11.447  ns/op
MyBenchmark.json  ASCII_PRINTABLE   LARGE  avgt    5  664.253 ± 20.294  ns/op

Any thoughts, better ideas?

@cowtowncoder
Copy link
Member

Ok, first, I am bit surprised by your findings; my benchmarks with https://github.com/FasterXML/jackson-benchmarks/ have found CBOR codec a bit faster than JSON one.
Its payload may not be most representative tho (quite small, originally from https://github.com/eishay/jvm-serializers).

Second: I think Java 17 and later screwed up performance of String construction: given a char[] JDK tries to "optimize" storage (of ASCII/Latin-1 only content) by re-constructing a byte[] -- first scanning to see if this is possible and so on. As a result, JDK 8 is still typically faster for Jackson JSON, CBOR and other codecs. I don't know of a good work-around for this problem.

As to constructing Strings directly from byte[] my first instinct was to say no, but come to think of it... maybe? As long as it'd be behind CBORParser.Feature flag, disabled by default, that could work.
So if you wanted to propose a pr, I'd be open to that -- assuming measurable performance benefit exists.

@cowtowncoder
Copy link
Member

Oh, also, given these are quite small documents as well, I wonder if profiled (like Java async-profiler) might find.

I assume tests were run with JDK 17, as per pom.xml.

@sugmanue
Copy link
Author

Hi Tatu, thanks for the quick response.

  1. Yes, I used Java 17 (OpenJDK Runtime Environment Corretto-17.0.12.7.1 (build 17.0.12+7-LTS)), and a Apple M1 Pro for running these tests. I also tried Java 17 on x86 with similar results
  2. Yes, Java 17 has been highly optimized towards latin1, that can be seen below that compares reading the String directly vs using the loop. With ASCII the speed is great but for multibyte Strings the loop is ~2x faster.
  3. Yes, I have done some profiling, that's how I came to work on _finishShortText and left _finishLongText untouched. As a side note, for Java 17 profiling shows that for multibyte string the traces show java/lang/String.decodeUTF8_UTF16 prominently. For ASCII it gets replaced by java/util/Arrays.copyOfRange
  4. Yes, those are small documents, I just created it to show what we have seen using a more realistic datasets that we used to benchmark internally.
  5. Using Java 8, JSON still outperforms current CBOR. It shows better performance for reading the String directly from the input buffer, but not as much. Also the loop performs better for multibyte Strings, but the not as much as with Java 17.
  6. I still think the speed up is worth doing, latin1 is so prevalent, and I reckon, Java has more engineering muscle to optimize for the multibyte cases.
  7. I will work on this and publish a proposal PR for it as discussed, thank you again for the response and feedback.

OpenJDK Runtime Environment Corretto-17.0.12.7.1 (build 17.0.12+7-LTS)

======================================== Building string in loop (current)
MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5  1283.499 ± 1158.138  ns/op
MyBenchmark.cbor              CJK   LARGE  avgt    5  2101.181 ±  143.699  ns/op
MyBenchmark.cbor            EMOJI   LARGE  avgt    5  2398.432 ±   48.833  ns/op

======================================== Reding string from input buffer
MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5   285.527 ±   1.033  ns/op
MyBenchmark.cbor              CJK   LARGE  avgt    5  4420.863 ±  97.255  ns/op
MyBenchmark.cbor            EMOJI   LARGE  avgt    5  3989.647 ±  95.884  ns/op

OpenJDK Runtime Environment Corretto-8.422.05.1 (build 1.8.0_422-b05)

======================================== JSON
MyBenchmark.json  ASCII_PRINTABLE   LARGE  avgt    5    607.711 ±   8.913  ns/op
MyBenchmark.json              CJK   LARGE  avgt    5   3723.571 ± 138.397  ns/op
MyBenchmark.json            EMOJI   LARGE  avgt    5  12197.943 ±  59.335  ns/op

======================================== Building string in loop (current)
Benchmark                (flavor)  (size)  Mode  Cnt      Score     Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5   1030.018 ±  24.343  ns/op
MyBenchmark.cbor              CJK   LARGE  avgt    5   2576.894 ± 100.827  ns/op
MyBenchmark.cbor            EMOJI   LARGE  avgt    5   2985.589 ±  26.489  ns/op

======================================== Reding string from input buffer
Benchmark                (flavor)  (size)  Mode  Cnt      Score     Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5    596.132 ±  24.761  ns/op
MyBenchmark.cbor              CJK   LARGE  avgt    5   3145.590 ±  38.429  ns/op
MyBenchmark.cbor            EMOJI   LARGE  avgt    5   3674.679 ± 114.500  ns/op

@cowtowncoder
Copy link
Member

First of all: yes, PR would be welcome!

One question/suggestion:

This test case:

MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5  1283.499 ± 1158.138  ns/op

seems to have error range as big as value so tests probably need to run for many more iterations.

Another note:

As a side note, for Java 17 profiling shows that for multibyte string the traces show java/lang/String.decodeUTF8_UTF16 prominently. For ASCII it gets replaced by java/util/Arrays.copyOfRange

Yes -- these are related to going from char[] given to construct String into byte[] encoding (at first I had strong "WTF?!?!?!" reaction) -- which "optimizes" by re-encoding from 16-bit java chars to possibly (ASCII/Latin-1 especially) more compact representation.
And those would get eliminated or replaced with simple scanning if constructing String from byte[].

@sugmanue
Copy link
Author

First of all: yes, PR would be welcome!

Great, 😄

MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5  1283.499 ± 1158.138  ns/op

seems to have error range as big as value so tests probably need to run for many more iterations.

Ugh, 🤦‍♂️, thanks for pointing it out, let's try again

======================================== Building string in loop (current)
Benchmark                (flavor)  (size)  Mode  Cnt     Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5  1051.440 ± 20.808  ns/op
MyBenchmark.json  ASCII_PRINTABLE   LARGE  avgt    5   622.523 ±  8.798  ns/op

Yes -- these are related to going from char[] given to construct String into byte[] encoding (at first I had strong "WTF?!?!?!" reaction) -- which "optimizes" by re-encoding from 16-bit java chars to possibly (ASCII/Latin-1 especially) more compact representation. And those would get eliminated or replaced with simple scanning if constructing String from byte[].

Yeah, I bet they are heavily beating on the prominence of latin1, which at least for us is true 😉, let me see what I can come up with, thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants