RFR: 8280124: Reduce branches decoding latin-1 chars from UTF-8 encoded bytes [v2]

Naoto Sato naoto at openjdk.java.net
Tue Jan 18 17:23:26 UTC 2022


On Tue, 18 Jan 2022 16:27:11 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> This resolves minor inefficiency in the fast-path for decoding latin-1 chars from UTF-8. I also took the opportunity to refactor the StringDecode microbenchmark to align with recent changes to the StringEncode micro.
>> 
>> The inefficiency is that this test is quite branchy:
>> 
>> `if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) && ...`
>> 
>> Since the two constant bytes differ only on the lowest bit this can be transformed to this, saving us a branch:
>> 
>> `if ((b1 & 0xfe) == 0xc2 && ...`
>> 
>> This provides a small speed-up on microbenchmarks where the input can be internally encoded as latin1:
>> 
>> 
>> Benchmark (charsetName) Mode Cnt Score Error Units
>> StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2283.591 ± 12.332 ns/op
>> 
>> StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2165.984 ± 13.136 ns/op
>
> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename and reorder latin1 micro

Marked as reviewed by naoto (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/7122


More information about the core-libs-dev mailing list