RFR: 8280124: Reduce branches decoding latin-1 chars from UTF-8 encoded bytes [v2]
Naoto Sato
naoto at openjdk.java.net
Tue Jan 18 17:23:26 UTC 2022
On Tue, 18 Jan 2022 16:27:11 GMT, Claes Redestad <redestad at openjdk.org> wrote:
>> This resolves minor inefficiency in the fast-path for decoding latin-1 chars from UTF-8. I also took the opportunity to refactor the StringDecode microbenchmark to align with recent changes to the StringEncode micro.
>>
>> The inefficiency is that this test is quite branchy:
>>
>> `if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) && ...`
>>
>> Since the two constant bytes differ only on the lowest bit this can be transformed to this, saving us a branch:
>>
>> `if ((b1 & 0xfe) == 0xc2 && ...`
>>
>> This provides a small speed-up on microbenchmarks where the input can be internally encoded as latin1:
>>
>>
>> Benchmark (charsetName) Mode Cnt Score Error Units
>> StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2283.591 ± 12.332 ns/op
>>
>> StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2165.984 ± 13.136 ns/op
>
> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
>
> Rename and reorder latin1 micro
Marked as reviewed by naoto (Reviewer).
-------------
PR: https://git.openjdk.java.net/jdk/pull/7122
More information about the core-libs-dev
mailing list