<i18n dev> RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]
Quan Anh Mai
qamai at openjdk.org
Wed Mar 15 15:00:28 UTC 2023
On Wed, 15 Mar 2023 14:23:35 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:
>> Many thanks to have tried, yep, I was curious indeed re the "StringLatin1.canEncode regression" case.
>> I would still modify the benchmark to use inputs (I know that will make it memory bound sadly, due to reading inputs - but the size of such inputs can be a benchmark parameter, together with the bias eg "latin","mix", "non-latin") "semi-randomly" generated based on the mentioned strategies/biases.
>> It will benefit future tests on this, although could be provided as a separate PR.
>
>> The StringLatin1.canEncode regression disappears.
>
> I mixed things up so StringLatin1.canEncode was benchmarked without the updated code.
>
> Here are updated benchmark results:
>
>
> Baseline:
>
>
> Benchmark (codePoint) Mode Cnt Score Error Units
> Characters.isDigitRandom 1632 avgt 15 5.437 ± 0.235 ns/op
>
>
> PR:
>
>
> Benchmark (codePoint) Mode Cnt Score Error Units
> Characters.isDigitRandom 1632 avgt 15 5.319 ± 0.341 ns/op
>
>
> StringLatin1.canEncode:
>
>
> Benchmark (codePoint) Mode Cnt Score Error Units
> Characters.isDigitRandom 1632 avgt 15 5.447 ± 0.304 ns/op
> ```
>
> So it seems using StringLatin1.canEncode still might have a regression also in the randomized input case.
>
> For this PR, I suggest we update StringLatin1.canEncode to be in sync with CharacterData.of, without one calling the other. If anyone wants to investigate the regression further, than can be done outside this PR.
>
> I have independently verified that StringLatin1.canEncode sees performance improvements using the StringIndexOf benchmark.
We can do `Integer.compareUnsigned(ch, 0xFF) <= 0`
-------------
PR: https://git.openjdk.org/jdk/pull/13040
More information about the i18n-dev
mailing list