<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

</head>

<body dir="ltr">

<div class="elementToProof" style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Hi Johannes,</div>

<div class="elementToProof" style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

I think the 3rd scenario you've mentioned is likely: we have Swedish or other languages that extend the ascii encoding with diacritics, which are non-ascii bytes are frequently interrupting ascii. For non-ascii heavy languages like Chinese, sometimes the text

 can include spaces or ascii digits; invoking the intrinsic for that scenario sounds a bit unwise too.</div>

<div class="elementToProof" style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div class="elementToProof" style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Regards,</div>

<div class="elementToProof" style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Chen Liang</div>

<div id="appendonsend"></div>

<hr style="display:inline-block;width:98%" tabindex="-1">

<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> core-libs-dev <core-libs-dev-retn@openjdk.org> on behalf of Johannes Döbler <jd@civilian-framework.org><br>

<b>Sent:</b> Monday, May 12, 2025 6:16 AM<br>

<b>To:</b> core-libs-dev@openjdk.org <core-libs-dev@openjdk.org><br>

<b>Subject:</b> potential performance improvement in sun.nio.cs.UTF_8</font>

<div> </div>

</div>

<div>I have a suggestion for a performance improvement in sun.nio.cs.UTF_8, the workhorse for stream based UTF-8 encoding and decoding, but don't know if this has been discussed before.<br>

I explain my idea for the decoding case:<br>

Claes Redestad describes in his blog <a class="x_moz-txt-link-freetext" href="https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html">

https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html</a>  how he used SIMD intrinsics (now JavaLangAccess.decodeASCII) to speed UTF_8 decoding when buffers are backed by arrays:<br>

<br>

<a class="x_moz-txt-link-freetext" href="https://github.com/openjdk/jdk/blob/0258d9998ebc523a6463818be00353c6ac8b7c9c/src/java.base/share/classes/sun/nio/cs/UTF_8.java#L231">https://github.com/openjdk/jdk/blob/0258d9998ebc523a6463818be00353c6ac8b7c9c/src/java.base/share/classes/sun/nio/cs/UTF_8.java#L231</a><br>

<ul>

<li>first a call to JLA.decodeASCII harvests all ASCII-characters (=1-byte UTF-8 sequence) at the beginning of the input</li><li>then enters the slow loop of looking at UTF-8 byte sequences in the input buffer and writing to the output buffer (this is basically the old implementation)<br>

</li></ul>

<p>If the input is all ASCII all decoding work is done in JLA.decodeASCII resulting in an extreme performance boost. But if the input contains a non-ASCII it will fall back to the slow array loop.</p>

<p>Now here is my idea: Why not call JLA.decodeASCI whenever an ASCII input is seen:</p>

<p><font face="monospace">while (sp < sl) {<br>

    int b1 = sa[sp];<br>

    if (b1 >= 0) {<br>

        // 1 byte, 7 bits: 0xxxxxxx<br>

        if (dp >= dl)<br>

            return xflow(src, sp, sl, dst, dp, 1);<br>

</font><font face="monospace">        // my change</font><br>

<font face="monospace"><b>        int n = JLA.decodeASCII(sa, sp, da, dp, Math.min(sl - sp, dl - dp));<br>

        sp += n;<br>

        dp += n;<br>

</b>    } else if ((b1 >> 5) == -2 && (b1 & 0x1e) != 0) {<br>

</font><br>

I setup a small improvised benchmark to get an idea of the impact:<br>

</p>

<p><font face="monospace">Benchmark                     (data)   Mode  Cnt        Score   Error  Units<br>

DecoderBenchmark.jdkDecoder  TD_8000  thrpt    2  2045960,037          ops/s<br>

DecoderBenchmark.jdkDecoder  TD_3999  thrpt    2   263744,675          ops/s<br>

DecoderBenchmark.jdkDecoder   TD_999  thrpt    2   154232,940          ops/s<br>

DecoderBenchmark.jdkDecoder   TD_499  thrpt    2   142239,763          ops/s<br>

DecoderBenchmark.jdkDecoder    TD_99  thrpt    2   128678,229          ops/s<br>

DecoderBenchmark.jdkDecoder     TD_9  thrpt    2   127388,649          ops/s<br>

DecoderBenchmark.jdkDecoder     TD_4  thrpt    2   119834,183          ops/s<br>

DecoderBenchmark.jdkDecoder     TD_2  thrpt    2   111733,115          ops/s<br>

DecoderBenchmark.jdkDecoder     TD_1  thrpt    2   102397,455          ops/s<br>

DecoderBenchmark.newDecoder  TD_8000  thrpt    2  2022997,518          ops/s<br>

DecoderBenchmark.newDecoder  TD_3999  thrpt    2  2909450,005          ops/s<br>

DecoderBenchmark.newDecoder   TD_999  thrpt    2  2140307,712          ops/s<br>

DecoderBenchmark.newDecoder   TD_499  thrpt    2  1171970,809          ops/s<br>

DecoderBenchmark.newDecoder    TD_99  thrpt    2   686771,614          ops/s<br>

DecoderBenchmark.newDecoder     TD_9  thrpt    2    95181,541          ops/s<br>

DecoderBenchmark.newDecoder     TD_4  thrpt    2    65656,184          ops/s<br>

DecoderBenchmark.newDecoder     TD_2  thrpt    2    45439,240          ops/s<br>

DecoderBenchmark.newDecoder     TD_1  thrpt    2    36994,738          ops/s</font></p>

<p>(The benchmark uses only memory buffers, each test input is a UTF-8 encoded byte buffer which produces 8000 chars and consists of various length of pure ascii bytes, followed by a 2-byte UTF-8 sequence producing a non-ASCII char:<br>

TD_8000: 8000 ascii bytes -> 1 call to JLA.decodeASCII<br>

TD_3999: 3999 ascii bytes + 2 non-ascii bytes, repeated 2 times -> 2 calls to JLA.decodeASCII<br>

...<br>

TD_1: 1 ascii byte + 2 non-ascii bytes, repeated 4000 times -> 4000 calls to JLA.decodeASCII</p>

<p>Interpretation:<br>

</p>

<ul>

<li>Input all ASCII: same performance as before</li><li>Input contains pure ASCII sequence of considerable length interrupted by non ASCII bytes: now seeing huge performance improvements similar to the pure ASCII case.<br>

</li><li>Input has lot of short sequences of ASCII-bytes interrupted by non ASCII bytes: at some point performance drops below current implementation.<br>

</li></ul>

<p>Thanks for reading and happy to hear your opinions,<br>

Johannes<br>

</p>

<br>

</div>

</body>

</html>