RFR: 8280124: Reduce branches decoding latin-1 chars from UTF-8 encoded bytes [v2]
Claes Redestad
redestad at openjdk.java.net
Tue Jan 18 16:27:11 UTC 2022
> This resolves minor inefficiency in the fast-path for decoding latin-1 chars from UTF-8. I also took the opportunity to refactor the StringDecode microbenchmark to align with recent changes to the StringEncode micro.
>
> The inefficiency is that this test is quite branchy:
>
> `if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) && ...`
>
> Since the two constant bytes differ only on the lowest bit this can be transformed to this, saving us a branch:
>
> `if ((b1 & 0xfe) == 0xc2 && ...`
>
> This provides a small speed-up on microbenchmarks where the input can be internally encoded as latin1:
>
>
> Benchmark (charsetName) Mode Cnt Score Error Units
> StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2283.591 ± 12.332 ns/op
>
> StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2165.984 ± 13.136 ns/op
Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
Rename and reorder latin1 micro
-------------
Changes:
- all: https://git.openjdk.java.net/jdk/pull/7122/files
- new: https://git.openjdk.java.net/jdk/pull/7122/files/926c32bf..e4b6c40a
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7122&range=01
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7122&range=00-01
Stats: 10 lines in 1 file changed: 5 ins; 5 del; 0 mod
Patch: https://git.openjdk.java.net/jdk/pull/7122.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/7122/head:pull/7122
PR: https://git.openjdk.java.net/jdk/pull/7122
More information about the core-libs-dev
mailing list