<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
I have a suggestion for a performance improvement in
sun.nio.cs.UTF_8, the workhorse for stream based UTF-8 encoding and
decoding, but don't know if this has been discussed before.<br>
I explain my idea for the decoding case:<br>
Claes Redestad describes in his blog
<a class="moz-txt-link-freetext" href="https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html">https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html</a> how
he used SIMD intrinsics (now JavaLangAccess.decodeASCII) to speed
UTF_8 decoding when buffers are backed by arrays:<br>
<br>
<a class="moz-txt-link-freetext" href="https://github.com/openjdk/jdk/blob/0258d9998ebc523a6463818be00353c6ac8b7c9c/src/java.base/share/classes/sun/nio/cs/UTF_8.java#L231">https://github.com/openjdk/jdk/blob/0258d9998ebc523a6463818be00353c6ac8b7c9c/src/java.base/share/classes/sun/nio/cs/UTF_8.java#L231</a><br>
<ul>
<li>first a call to JLA.decodeASCII harvests all ASCII-characters
(=1-byte UTF-8 sequence) at the beginning of the input</li>
<li>then enters the slow loop of looking at UTF-8 byte sequences
in the input buffer and writing to the output buffer (this is
basically the old implementation)<br>
</li>
</ul>
<p>If the input is all ASCII all decoding work is done in
JLA.decodeASCII resulting in an extreme performance boost. But if
the input contains a non-ASCII it will fall back to the slow array
loop.</p>
<p>Now here is my idea: Why not call JLA.decodeASCI whenever an
ASCII input is seen:</p>
<p><font face="monospace">while (sp < sl) {<br>
int b1 = sa[sp];<br>
if (b1 >= 0) {<br>
// 1 byte, 7 bits: 0xxxxxxx<br>
if (dp >= dl)<br>
return xflow(src, sp, sl, dst, dp, 1);<br>
</font><font face="monospace"> // my change</font><br>
<font face="monospace"><b> int n = JLA.decodeASCII(sa, sp,
da, dp, Math.min(sl - sp, dl - dp));<br>
sp += n;<br>
dp += n;<br>
</b> } else if ((b1 >> 5) == -2 && (b1 &
0x1e) != 0) {<br>
</font><br>
I setup a small improvised benchmark to get an idea of the impact:<br>
</p>
<p><font face="monospace">Benchmark (data)
Mode Cnt Score Error Units<br>
DecoderBenchmark.jdkDecoder TD_8000 thrpt 2
2045960,037 ops/s<br>
DecoderBenchmark.jdkDecoder TD_3999 thrpt 2
263744,675 ops/s<br>
DecoderBenchmark.jdkDecoder TD_999 thrpt 2
154232,940 ops/s<br>
DecoderBenchmark.jdkDecoder TD_499 thrpt 2
142239,763 ops/s<br>
DecoderBenchmark.jdkDecoder TD_99 thrpt 2
128678,229 ops/s<br>
DecoderBenchmark.jdkDecoder TD_9 thrpt 2
127388,649 ops/s<br>
DecoderBenchmark.jdkDecoder TD_4 thrpt 2
119834,183 ops/s<br>
DecoderBenchmark.jdkDecoder TD_2 thrpt 2
111733,115 ops/s<br>
DecoderBenchmark.jdkDecoder TD_1 thrpt 2
102397,455 ops/s<br>
DecoderBenchmark.newDecoder TD_8000 thrpt 2
2022997,518 ops/s<br>
DecoderBenchmark.newDecoder TD_3999 thrpt 2
2909450,005 ops/s<br>
DecoderBenchmark.newDecoder TD_999 thrpt 2
2140307,712 ops/s<br>
DecoderBenchmark.newDecoder TD_499 thrpt 2
1171970,809 ops/s<br>
DecoderBenchmark.newDecoder TD_99 thrpt 2
686771,614 ops/s<br>
DecoderBenchmark.newDecoder TD_9 thrpt 2
95181,541 ops/s<br>
DecoderBenchmark.newDecoder TD_4 thrpt 2
65656,184 ops/s<br>
DecoderBenchmark.newDecoder TD_2 thrpt 2
45439,240 ops/s<br>
DecoderBenchmark.newDecoder TD_1 thrpt 2
36994,738 ops/s</font></p>
<p>(The benchmark uses only memory buffers, each test input is a
UTF-8 encoded byte buffer which produces 8000 chars and consists
of various length of pure ascii bytes, followed by a 2-byte UTF-8
sequence producing a non-ASCII char:<br>
TD_8000: 8000 ascii bytes -> 1 call to JLA.decodeASCII<br>
TD_3999: 3999 ascii bytes + 2 non-ascii bytes, repeated 2 times
-> 2 calls to JLA.decodeASCII<br>
...<br>
TD_1: 1 ascii byte + 2 non-ascii bytes, repeated 4000 times ->
4000 calls to JLA.decodeASCII</p>
<p>Interpretation:<br>
</p>
<ul>
<li>Input all ASCII: same performance as before</li>
<li>Input contains pure ASCII sequence of considerable length
interrupted by non ASCII bytes: now seeing huge performance
improvements similar to the pure ASCII case.<br>
</li>
<li>Input has lot of short sequences of ASCII-bytes interrupted by
non ASCII bytes: at some point performance drops below current
implementation.<br>
</li>
</ul>
<p>Thanks for reading and happy to hear your opinions,<br>
Johannes<br>
</p>
<br>
</body>
</html>