<i18n dev> RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test [v2]

Wed Mar 15 15:53:49 UTC 2023

On Wed, 15 Mar 2023 14:56:28 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:

>> Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update StringLatin1.canEncode to sync with same test in CharacterData.of
>
> Just for fun, I tried with a benchmark where the code point is Latin1 in every other call:
> 
> 
> @Benchmark
> public void isDigitVarying(Blackhole blackhole) {
>     blackhole.consume(Character.isDigit(48));
>     blackhole.consume(Character.isDigit(1632));
> }
> 
> 
> With this benchmark, there is no difference between the baseline, the PR and using StringLatin1.canEncode:
> 
> Baseline:
> 
> 
> Benchmark                  (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitVarying         1632  avgt   15  1.198 ± 0.056  ns/op
> 
> 
> PR:
> 
> 
> Benchmark                  (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitVarying         1632  avgt   15  1.195 ± 0.058  ns/op
> 
> 
> StringLatin1.canEncode:
> 
> 
> Benchmark                  (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigitVarying         1632  avgt   15  1.193 ± 0.055  ns/op
> ``` 
> 
> At this point, I'm starting to wonder a bit if the performance benefits suggested by this PR might be dubious and will only surface in very narrow benchmarks. On the other hand, it does not seem harmful either. What do people think?

> @eirbjo I suggest to add `-prof perfasm` and see what's going on; I suspect it's better that inputs are loaded from `@State` field...and don't use `BlackHole` but combine the results of the 2 operations (although I remember that compiler assisted black holes are not as noisy as I'm used to observe with JDK 11 ones)

I think I'm done with benchmarking for this PR. Any StringLatin1.canEncode method call regression can be solved outside this PR.

The PR as it stands seems to have some benefit for Latin1-only apps/use cases, which I think is not uncommon. Any benefit for mixed cases are more dubious, but don't seem to hurt.

-------------

PR: https://git.openjdk.org/jdk/pull/13040