[jdk18] RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64

Patric Hedlin phedlin at openjdk.java.net
Tue Dec 14 14:56:32 UTC 2021


On Tue, 14 Dec 2021 10:45:28 GMT, Patric Hedlin <phedlin at openjdk.org> wrote:

> Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support.
> 
> Implementation with slight focus on balance between footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check and avoid performance loss in the ISO-only case.
> 
> - Interleaved ISO and ASCII check code.
> - Avoid 'umaxv' in the ISO main flow.
> - Using post inc in main loop.
> - Retain 8-char loop.
> - Removing conditional prefetch (no upside).
> - Adding ISO-8859-1 to encode-decode benchmark.
> 
> Testing: tier1-3
> 
> The revised version compares like this (master vs. update).
> 
> Benchmark                   (size)       (type)  Mode  Cnt    Score   Error  Units
> CharsetEncodeDecode.encode   16384        UTF-8  avgt   30   17.920 ± 0.229  us/op
> CharsetEncodeDecode.encode   16384         BIG5  avgt   30   18.867 ± 0.356  us/op
> CharsetEncodeDecode.encode   16384  ISO-8859-15  avgt   30   17.419 ± 0.220  us/op
> CharsetEncodeDecode.encode   16384   ISO-8859-1  avgt   30    6.200 ± 0.134  us/op
> CharsetEncodeDecode.encode   16384        ASCII  avgt   30   17.149 ± 0.219  us/op
> CharsetEncodeDecode.encode   16384       UTF-16  avgt   30  135.115 ± 1.440  us/op
> 
> 
> Benchmark                   (size)       (type)  Mode  Cnt    Score   Error  Units
> CharsetEncodeDecode.encode   16384        UTF-8  avgt   30    9.018 ± 0.179  us/op
> CharsetEncodeDecode.encode   16384         BIG5  avgt   30   10.550 ± 0.470  us/op
> CharsetEncodeDecode.encode   16384  ISO-8859-15  avgt   30    8.843 ± 0.187  us/op
> CharsetEncodeDecode.encode   16384   ISO-8859-1  avgt   30    6.406 ± 0.155  us/op
> CharsetEncodeDecode.encode   16384        ASCII  avgt   30    8.822 ± 0.173  us/op
> CharsetEncodeDecode.encode   16384       UTF-16  avgt   30  135.195 ± 1.432  us/op

Benchmarks, master vs. update (ran on Aurora/Ampere Altra):


openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ASCII ........77.55%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:BIG5 .........76.71%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ISO_8859_1 ...-2.31%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ISO_8859_15 ..75.58%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_16 ....... 1.04%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_8 ........76.90%

Note that ISO-8859-1 compares with the old intrinsic implementation (essentially the same) and that UTF-16 does not utilise the intrinsic.

Runs that show the more pessimistic speed-up, when processing 2^n - 1 chars.

openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:ASCII .........72.97%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:BIG5 ..........64.46%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:ISO_8859_1 ....-1.67%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:ISO_8859_15 ...70.85%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:UTF_16 ........-4.60%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:UTF_8 .........70.44%

openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:ASCII ..........60.35%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:BIG5 ...........52.61%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:ISO_8859_1 ..... 1.75%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:ISO_8859_15 ....61.45%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:UTF_16 .........-1.01%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:UTF_8 ..........59.46%

openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ASCII ..........54.26%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:BIG5 ...........42.82%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ISO_8859_1 .....-0.54%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ISO_8859_15 ....64.86%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_16 .........-0.09%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_8 ..........60.44%

openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ASCII ..........51.51%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:BIG5 ...........46.54%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ISO_8859_1 .....-0.32%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ISO_8859_15 ....56.48%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_16 ......... 0.44%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_8 ..........54.84%

Runs to illustrate the threshold effect between the loops in the implementation.

openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:ASCII ...........32.30%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:BIG5 ............31.93%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:ISO_8859_1 ......-0.02%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:ISO_8859_15 .....37.92%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:UTF_16 .......... 4.45%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:UTF_8 ...........40.35%

openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ASCII ...........20.06%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:BIG5 ............21.64%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ISO_8859_1 ......-1.13%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ISO_8859_15 .....27.04%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_16 .......... 1.20%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_8 ...........24.72%

openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ASCII ...........19.37%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:BIG5 ............20.20%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ISO_8859_1 ......-1.01%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ISO_8859_15 .....29.16%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_16 .......... 0.34%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_8 ...........25.35%

openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ASCII ...........13.03%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:BIG5 ............13.74%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ISO_8859_1 ......-0.13%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ISO_8859_15 .....19.26%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_16 .......... 0.78%
openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_8 ...........17.70%

Using the microbenchmarks provided by @carterkozak here: https://github.com/carterkozak/stringbuilder-encoding-performance, comparing master vs. update as follows:

Benchmark                                                      (charsetName)                          (message)  (timesToAppend)  Mode  Cnt    Score    Error  Units
EncoderBenchmarks.charsetEncoder                                       UTF-8     This is a simple ASCII message                3  avgt    4  151.025 ± 28.111  ns/op
EncoderBenchmarks.charsetEncoder                                       UTF-8  This is a message with unicode ��                3  avgt    4  323.254 ±  5.648  ns/op
EncoderBenchmarks.charsetEncoderWithAllocation                         UTF-8     This is a simple ASCII message                3  avgt    4  244.375 ± 98.844  ns/op
EncoderBenchmarks.charsetEncoderWithAllocation                         UTF-8  This is a message with unicode ��                3  avgt    4  405.415 ±  5.947  ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder          UTF-8     This is a simple ASCII message                3  avgt    4  728.172 ± 22.419  ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder          UTF-8  This is a message with unicode ��                3  avgt    4  859.015 ± 90.541  ns/op
EncoderBenchmarks.toStringGetBytes                                     UTF-8     This is a simple ASCII message                3  avgt    4  117.044 ± 11.484  ns/op
EncoderBenchmarks.toStringGetBytes                                     UTF-8  This is a message with unicode ��                3  avgt    4  483.399 ± 38.614  ns/op



Benchmark                                                      (charsetName)                          (message)  (timesToAppend)  Mode  Cnt    Score    Error  Units
EncoderBenchmarks.charsetEncoder                                       UTF-8     This is a simple ASCII message                3  avgt    4  113.954 ±  7.657  ns/op
EncoderBenchmarks.charsetEncoder                                       UTF-8  This is a message with unicode ��                3  avgt    4  353.266 ± 10.124  ns/op
EncoderBenchmarks.charsetEncoderWithAllocation                         UTF-8     This is a simple ASCII message                3  avgt    4  196.643 ± 52.954  ns/op
EncoderBenchmarks.charsetEncoderWithAllocation                         UTF-8  This is a message with unicode ��                3  avgt    4  429.157 ± 11.506  ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder          UTF-8     This is a simple ASCII message                3  avgt    4  728.138 ± 34.898  ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder          UTF-8  This is a message with unicode ��                3  avgt    4  859.697 ± 61.397  ns/op
EncoderBenchmarks.toStringGetBytes                                     UTF-8     This is a simple ASCII message                3  avgt    4  117.269 ±  6.623  ns/op
EncoderBenchmarks.toStringGetBytes                                     UTF-8  This is a message with unicode ��                3  avgt    4  491.559 ± 68.169  ns/op

Note: The above was ran on a local dev-machine typically producing less than _perfectly_ consistent results.

-------------

PR: https://git.openjdk.java.net/jdk18/pull/20


More information about the hotspot-compiler-dev mailing list