potential performance improvement in sun.nio.cs.UTF_8

Mon May 12 11:16:37 UTC 2025

I have a suggestion for a performance improvement in sun.nio.cs.UTF_8, 
the workhorse for stream based UTF-8 encoding and decoding, but don't 
know if this has been discussed before.
I explain my idea for the decoding case:
Claes Redestad describes in his blog 
https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html  how he 
used SIMD intrinsics (now JavaLangAccess.decodeASCII) to speed UTF_8 
decoding when buffers are backed by arrays:

https://github.com/openjdk/jdk/blob/0258d9998ebc523a6463818be00353c6ac8b7c9c/src/java.base/share/classes/sun/nio/cs/UTF_8.java#L231

  * first a call to JLA.decodeASCII harvests all ASCII-characters
    (=1-byte UTF-8 sequence) at the beginning of the input
  * then enters the slow loop of looking at UTF-8 byte sequences in the
    input buffer and writing to the output buffer (this is basically the
    old implementation)

If the input is all ASCII all decoding work is done in JLA.decodeASCII 
resulting in an extreme performance boost. But if the input contains a 
non-ASCII it will fall back to the slow array loop.

Now here is my idea: Why not call JLA.decodeASCI whenever an ASCII input 
is seen:

while (sp < sl) {
     int b1 = sa[sp];
     if (b1 >= 0) {
         // 1 byte, 7 bits: 0xxxxxxx
         if (dp >= dl)
             return xflow(src, sp, sl, dst, dp, 1);
         // my change
*        int n = JLA.decodeASCII(sa, sp, da, dp, Math.min(sl - sp, dl - 
dp));
         sp += n;
         dp += n;
*    } else if ((b1 >> 5) == -2 && (b1 & 0x1e) != 0) {

I setup a small improvised benchmark to get an idea of the impact:

Benchmark                     (data) Mode  Cnt        Score   Error  Units
DecoderBenchmark.jdkDecoder  TD_8000  thrpt    2 2045960,037          ops/s
DecoderBenchmark.jdkDecoder  TD_3999  thrpt    2 263744,675          ops/s
DecoderBenchmark.jdkDecoder   TD_999  thrpt    2 154232,940          ops/s
DecoderBenchmark.jdkDecoder   TD_499  thrpt    2 142239,763          ops/s
DecoderBenchmark.jdkDecoder    TD_99  thrpt    2 128678,229          ops/s
DecoderBenchmark.jdkDecoder     TD_9  thrpt    2 127388,649          ops/s
DecoderBenchmark.jdkDecoder     TD_4  thrpt    2 119834,183          ops/s
DecoderBenchmark.jdkDecoder     TD_2  thrpt    2 111733,115          ops/s
DecoderBenchmark.jdkDecoder     TD_1  thrpt    2 102397,455          ops/s
DecoderBenchmark.newDecoder  TD_8000  thrpt    2 2022997,518          ops/s
DecoderBenchmark.newDecoder  TD_3999  thrpt    2 2909450,005          ops/s
DecoderBenchmark.newDecoder   TD_999  thrpt    2 2140307,712          ops/s
DecoderBenchmark.newDecoder   TD_499  thrpt    2 1171970,809          ops/s
DecoderBenchmark.newDecoder    TD_99  thrpt    2 686771,614          ops/s
DecoderBenchmark.newDecoder     TD_9  thrpt    2 95181,541          ops/s
DecoderBenchmark.newDecoder     TD_4  thrpt    2 65656,184          ops/s
DecoderBenchmark.newDecoder     TD_2  thrpt    2 45439,240          ops/s
DecoderBenchmark.newDecoder     TD_1  thrpt    2 36994,738          ops/s

(The benchmark uses only memory buffers, each test input is a UTF-8 
encoded byte buffer which produces 8000 chars and consists of various 
length of pure ascii bytes, followed by a 2-byte UTF-8 sequence 
producing a non-ASCII char:
TD_8000: 8000 ascii bytes -> 1 call to JLA.decodeASCII
TD_3999: 3999 ascii bytes + 2 non-ascii bytes, repeated 2 times -> 2 
calls to JLA.decodeASCII
...
TD_1: 1 ascii byte + 2 non-ascii bytes, repeated 4000 times -> 4000 
calls to JLA.decodeASCII

Interpretation:

  * Input all ASCII: same performance as before
  * Input contains pure ASCII sequence of considerable length
    interrupted by non ASCII bytes: now seeing huge performance
    improvements similar to the pure ASCII case.
  * Input has lot of short sequences of ASCII-bytes interrupted by non
    ASCII bytes: at some point performance drops below current
    implementation.

Thanks for reading and happy to hear your opinions,
Johannes

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250512/3d26dcc4/attachment.htm>