RFR: 8279833: Loop optimization issue in String.encodeUTF8_UTF16 [v4]

Wed Jan 12 12:59:31 UTC 2022

On Tue, 11 Jan 2022 13:53:58 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> In `String.encodeUTF8_UTF16`, making the `char c` local to each loop helps the performance of the method by helping C2 optimize each individual loop better.
>> 
>> Results on the updated micros:
>> 19-b04:
>> 
>> Benchmark                          (charsetName)  Mode  Cnt     Score     Error  Units
>> StringEncode.encodeUTF16                   UTF-8  avgt   15   164.457 ±   7.296  ns/op
>> StringEncode.encodeUTF16LongEnd            UTF-8  avgt   15  2294.160 ±  40.580  ns/op
>> StringEncode.encodeUTF16LongStart          UTF-8  avgt   15  9128.698 ± 124.636  ns/op
>> 
>> 
>> Patch:
>> 
>> Benchmark                          (charsetName)  Mode  Cnt     Score    Error  Units
>> StringEncode.encodeUTF16                   UTF-8  avgt   15   131.296 ±  6.693  ns/op
>> StringEncode.encodeUTF16LongEnd            UTF-8  avgt   15  2282.750 ± 46.891  ns/op
>> StringEncode.encodeUTF16LongStart          UTF-8  avgt   15  4786.965 ± 64.896  ns/op
>> 
>> 
>> Going back this seem to have been an issue since this method was introduced with JEP 254 in JDK 9:
>> 
>> 8u311:
>> 
>> Benchmark                          (charsetName)  Mode  Cnt     Score     Error  Units
>> StringEncode.encodeUTF16                   UTF-8  avgt   15   194.057 ±   3.913  ns/op
>> StringEncode.encodeUTF16LongEnd            UTF-8  avgt   15  3024.860 ±  88.386  ns/op
>> StringEncode.encodeUTF16LongStart          UTF-8  avgt   15  5282.849 ± 247.230  ns/op
>> 
>> 9:
>> 
>> Benchmark                          (charsetName)  Mode  Cnt      Score     Error  Units
>> StringEncode.encodeUTF16                   UTF-8  avgt   15    148.481 ±   9.374  ns/op
>> StringEncode.encodeUTF16LongEnd            UTF-8  avgt   15   2832.754 ± 263.372  ns/op
>> StringEncode.encodeUTF16LongStart          UTF-8  avgt   15  10447.115 ± 408.338  ns/op
>> 
>> 
>> (Interestingly JDK 9 seem faster on some of the micros than later iterations, while slower on others. The main issue is the slow non-ASCII loop, where the patch speeds things up ~2x. With the patch we're significantly faster than both JDK 8 and 9 on all measures.)
>> 
>> There's a JIT compiler bug hiding in plain sight here where the potentially uninitialized local `char c` appears to mess up optimization of the second loop. I'll file a separate bug for this (a standalone reproducer should be straightforward to produce). I think this patch is reasonable to put back into the JDK while we investigate if/how C2 can better handle this kind of condition. It might also be easier to backport, depending on whether the C2 fix is trivial or not.
>
> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Happy new year!

Looks like there's one more similar case: https://github.com/openjdk/jdk/pull/7034

-------------

PR: https://git.openjdk.java.net/jdk/pull/7026