RFR: 8261744: Implement CharsetDecoder ASCII and latin-1 fast-paths
Philippe Marschall
github.com+471021+marschall at openjdk.java.net
Mon Feb 15 19:59:43 UTC 2021
On Mon, 15 Feb 2021 11:30:54 GMT, Claes Redestad <redestad at openjdk.org> wrote:
>> This patch exposes a couple of intrinsics used by String to speed up ASCII checking and byte[] -> char[] inflation, which can be used by latin1 and ASCII-compatible CharsetDecoders to speed up decoding operations.
>>
>> - Fast-path implemented for all standard charsets, with up to 10x performance improvements in microbenchmarks reading Strings from ByteArrayInputStream.
>> - Cleanup of StreamDecoder/-Encoder with some minor improvements when interpreting
>> - Reworked creation of JavaLangAccess to be safely published for CharsetDecoders/-Encoders used for setting up System.out/in. As JLA and these encoders are created during System.initPhase1 the current sequence caused the initialization to became unstable and a few tests were consistently getting an NPE.
>>
>> Testing: tier1-3
>
> On the best-case microbenchmark (the byte stream is all ASCII), speed-ups are quite telling:
>
> Before:
> Benchmark (charsetName) (length) Mode Cnt Score Error Units
> ByteStreamDecoder.readStringReader US-ASCII 256 avgt 10 2085.399 ± 66.522 ns/op
> ByteStreamDecoder.readStringReader US-ASCII 4096 avgt 10 12466.702 ± 747.116 ns/op
> ByteStreamDecoder.readStringReader US-ASCII 25000 avgt 10 123508.987 ± 3583.345 ns/op
> ByteStreamDecoder.readStringReader ISO-8859-1 256 avgt 10 1894.185 ± 51.772 ns/op
> ByteStreamDecoder.readStringReader ISO-8859-1 4096 avgt 10 8117.404 ± 594.708 ns/op
> ByteStreamDecoder.readStringReader ISO-8859-1 25000 avgt 10 99409.792 ± 28308.936 ns/op
> ByteStreamDecoder.readStringReader UTF-8 256 avgt 10 2090.337 ± 56.500 ns/op
> ByteStreamDecoder.readStringReader UTF-8 4096 avgt 10 11698.221 ± 898.910 ns/op
> ByteStreamDecoder.readStringReader UTF-8 25000 avgt 10 66568.987 ± 4204.361 ns/op
> ByteStreamDecoder.readStringReader ISO-8859-6 256 avgt 10 3061.130 ± 120.132 ns/op
> ByteStreamDecoder.readStringReader ISO-8859-6 4096 avgt 10 24623.494 ± 1916.362 ns/op
> ByteStreamDecoder.readStringReader ISO-8859-6 25000 avgt 10 139138.140 ± 7109.636 ns/op
> ByteStreamDecoder.readStringReader MS932 256 avgt 10 2612.535 ± 98.638 ns/op
> ByteStreamDecoder.readStringReader MS932 4096 avgt 10 18843.438 ± 1767.822 ns/op
> ByteStreamDecoder.readStringReader MS932 25000 avgt 10 119923.997 ± 18560.065 ns/op
>
> After:
> Benchmark (charsetName) (length) Mode Cnt Score Error Units
> ByteStreamDecoder.readStringReader US-ASCII 256 avgt 10 1556.588 ± 37.083 ns/op
> ByteStreamDecoder.readStringReader US-ASCII 4096 avgt 10 3290.627 ± 125.327 ns/op
> ByteStreamDecoder.readStringReader US-ASCII 25000 avgt 10 13118.794 ± 597.086 ns/op
> ByteStreamDecoder.readStringReader ISO-8859-1 256 avgt 10 1525.460 ± 36.510 ns/op
> ByteStreamDecoder.readStringReader ISO-8859-1 4096 avgt 10 3051.887 ± 113.036 ns/op
> ByteStreamDecoder.readStringReader ISO-8859-1 25000 avgt 10 11401.228 ± 563.124 ns/op
> ByteStreamDecoder.readStringReader UTF-8 256 avgt 10 1596.878 ± 43.824 ns/op
> ByteStreamDecoder.readStringReader UTF-8 4096 avgt 10 3349.961 ± 119.278 ns/op
> ByteStreamDecoder.readStringReader UTF-8 25000 avgt 10 13273.403 ± 591.600 ns/op
> ByteStreamDecoder.readStringReader ISO-8859-6 256 avgt 10 1602.328 ± 44.092 ns/op
> ByteStreamDecoder.readStringReader ISO-8859-6 4096 avgt 10 3403.312 ± 107.516 ns/op
> ByteStreamDecoder.readStringReader ISO-8859-6 25000 avgt 10 13163.468 ± 709.642 ns/op
> ByteStreamDecoder.readStringReader MS932 256 avgt 10 1602.837 ± 32.021 ns/op
> ByteStreamDecoder.readStringReader MS932 4096 avgt 10 3379.439 ± 87.716 ns/op
> ByteStreamDecoder.readStringReader MS932 25000 avgt 10 13376.980 ± 669.983 ns/op
>
> Performance degrades when you mix in non-ASCII characters, but so does the `new String(byte[], Charset)` baseline. While there might be algorithmic refinements possible to improve on some of the non-ASCII variants I'm happy to leave that to a follow-up and keep this RFE reasonably straightforward.
Is there a reason `sun.nio.cs.ISO_8859_1.Encoder#implEncodeISOArray(char[], int, byte[], int, int)` wasn't moved to `JavaLangAccess` as well?
-------------
PR: https://git.openjdk.java.net/jdk/pull/2574
More information about the core-libs-dev
mailing list