[jdk18] RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64
Patric Hedlin
phedlin at openjdk.java.net
Tue Dec 14 14:56:32 UTC 2021
On Tue, 14 Dec 2021 10:45:28 GMT, Patric Hedlin <phedlin at openjdk.org> wrote:
> Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support.
>
> Implementation with slight focus on balance between footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check and avoid performance loss in the ISO-only case.
>
> - Interleaved ISO and ASCII check code.
> - Avoid 'umaxv' in the ISO main flow.
> - Using post inc in main loop.
> - Retain 8-char loop.
> - Removing conditional prefetch (no upside).
> - Adding ISO-8859-1 to encode-decode benchmark.
>
> Testing: tier1-3
>
> The revised version compares like this (master vs. update).
>
> Benchmark (size) (type) Mode Cnt Score Error Units
> CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 17.920 ± 0.229 us/op
> CharsetEncodeDecode.encode 16384 BIG5 avgt 30 18.867 ± 0.356 us/op
> CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 17.419 ± 0.220 us/op
> CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.200 ± 0.134 us/op
> CharsetEncodeDecode.encode 16384 ASCII avgt 30 17.149 ± 0.219 us/op
> CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.115 ± 1.440 us/op
>
>
> Benchmark (size) (type) Mode Cnt Score Error Units
> CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 9.018 ± 0.179 us/op
> CharsetEncodeDecode.encode 16384 BIG5 avgt 30 10.550 ± 0.470 us/op
> CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 8.843 ± 0.187 us/op
> CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.406 ± 0.155 us/op
> CharsetEncodeDecode.encode 16384 ASCII avgt 30 8.822 ± 0.173 us/op
> CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.195 ± 1.432 us/op
Benchmarks, master vs. update (ran on Aurora/Ampere Altra):
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ASCII ........77.55%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:BIG5 .........76.71%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ISO_8859_1 ...-2.31%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ISO_8859_15 ..75.58%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_16 ....... 1.04%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_8 ........76.90%
Note that ISO-8859-1 compares with the old intrinsic implementation (essentially the same) and that UTF-16 does not utilise the intrinsic.
Runs that show the more pessimistic speed-up, when processing 2^n - 1 chars.
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:ASCII .........72.97%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:BIG5 ..........64.46%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:ISO_8859_1 ....-1.67%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:ISO_8859_15 ...70.85%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:UTF_16 ........-4.60%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:UTF_8 .........70.44%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:ASCII ..........60.35%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:BIG5 ...........52.61%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:ISO_8859_1 ..... 1.75%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:ISO_8859_15 ....61.45%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:UTF_16 .........-1.01%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:UTF_8 ..........59.46%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ASCII ..........54.26%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:BIG5 ...........42.82%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ISO_8859_1 .....-0.54%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ISO_8859_15 ....64.86%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_16 .........-0.09%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_8 ..........60.44%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ASCII ..........51.51%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:BIG5 ...........46.54%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ISO_8859_1 .....-0.32%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ISO_8859_15 ....56.48%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_16 ......... 0.44%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_8 ..........54.84%
Runs to illustrate the threshold effect between the loops in the implementation.
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:ASCII ...........32.30%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:BIG5 ............31.93%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:ISO_8859_1 ......-0.02%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:ISO_8859_15 .....37.92%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:UTF_16 .......... 4.45%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:UTF_8 ...........40.35%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ASCII ...........20.06%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:BIG5 ............21.64%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ISO_8859_1 ......-1.13%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ISO_8859_15 .....27.04%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_16 .......... 1.20%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_8 ...........24.72%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ASCII ...........19.37%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:BIG5 ............20.20%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ISO_8859_1 ......-1.01%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ISO_8859_15 .....29.16%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_16 .......... 0.34%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_8 ...........25.35%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ASCII ...........13.03%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:BIG5 ............13.74%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ISO_8859_1 ......-0.13%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ISO_8859_15 .....19.26%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_16 .......... 0.78%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_8 ...........17.70%
Using the microbenchmarks provided by @carterkozak here: https://github.com/carterkozak/stringbuilder-encoding-performance, comparing master vs. update as follows:
Benchmark (charsetName) (message) (timesToAppend) Mode Cnt Score Error Units
EncoderBenchmarks.charsetEncoder UTF-8 This is a simple ASCII message 3 avgt 4 151.025 ± 28.111 ns/op
EncoderBenchmarks.charsetEncoder UTF-8 This is a message with unicode 3 avgt 4 323.254 ± 5.648 ns/op
EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a simple ASCII message 3 avgt 4 244.375 ± 98.844 ns/op
EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a message with unicode 3 avgt 4 405.415 ± 5.947 ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a simple ASCII message 3 avgt 4 728.172 ± 22.419 ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a message with unicode 3 avgt 4 859.015 ± 90.541 ns/op
EncoderBenchmarks.toStringGetBytes UTF-8 This is a simple ASCII message 3 avgt 4 117.044 ± 11.484 ns/op
EncoderBenchmarks.toStringGetBytes UTF-8 This is a message with unicode 3 avgt 4 483.399 ± 38.614 ns/op
Benchmark (charsetName) (message) (timesToAppend) Mode Cnt Score Error Units
EncoderBenchmarks.charsetEncoder UTF-8 This is a simple ASCII message 3 avgt 4 113.954 ± 7.657 ns/op
EncoderBenchmarks.charsetEncoder UTF-8 This is a message with unicode 3 avgt 4 353.266 ± 10.124 ns/op
EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a simple ASCII message 3 avgt 4 196.643 ± 52.954 ns/op
EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a message with unicode 3 avgt 4 429.157 ± 11.506 ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a simple ASCII message 3 avgt 4 728.138 ± 34.898 ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a message with unicode 3 avgt 4 859.697 ± 61.397 ns/op
EncoderBenchmarks.toStringGetBytes UTF-8 This is a simple ASCII message 3 avgt 4 117.269 ± 6.623 ns/op
EncoderBenchmarks.toStringGetBytes UTF-8 This is a message with unicode 3 avgt 4 491.559 ± 68.169 ns/op
Note: The above was ran on a local dev-machine typically producing less than _perfectly_ consistent results.
-------------
PR: https://git.openjdk.java.net/jdk18/pull/20
More information about the hotspot-compiler-dev
mailing list