RFR: 8280124: Reduce branches decoding latin-1 chars from UTF-8 encoded bytes

Tue Jan 18 10:44:28 UTC 2022

On Tue, 18 Jan 2022 10:08:35 GMT, Claes Redestad <redestad at openjdk.org> wrote:

> This resolves minor inefficiency in the fast-path for decoding latin-1 chars from UTF-8. I also took the opportunity to refactor the StringDecode microbenchmark to align with recent changes to the StringEncode micro.
> 
> The inefficiency is that this test is quite branchy:
> 
> `if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) && ...`
> 
> Since the two constant bytes differ only on the lowest bit this can be transformed to this, saving us a branch:
> 
> `if ((b1 & 0xfe) == 0xc2 && ...`
> 
> This provides a small speed-up on microbenchmarks where the input can be internally encoded as latin1:
> 
> 
> Benchmark (charsetName) Mode Cnt Score Error Units
> StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2283.591 ± 12.332 ns/op
> 
> StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2165.984 ± 13.136 ns/op

On a microbenchmark that zooms in on the logical predicate the speed-up is closer to 2x. This seems like a transformation a JIT could do automatically. gcc and clang doesn't do it, but icc seem to pull it off (as tested via godbolt.org). It's unclear if this is common enough to motivate such enhancement work, but it might be of academic interest to attempt it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7122