RFR: 8311220: Optimization for StringLatin UpperLower [v3]
Claes Redestad
redestad at openjdk.org
Thu Aug 31 11:48:03 UTC 2023
On Thu, 6 Jul 2023 05:20:14 GMT, 温绍锦 <duke at openjdk.org> wrote:
>> # Benchmark Result
>>
>>
>> sh make/devkit/createJMHBundle.sh
>> bash configure --with-jmh=build/jmh/jars
>> make test TEST="micro:java.lang.StringUpperLower.*"
>>
>>
>>
>> ## 1. [aliyun_ecs_c8i.xlarge](https://help.aliyun.com/document_detail/25378.html#c8i)
>> * cpu : intel xeon sapphire rapids (x64)
>>
>> ``` diff
>> -Benchmark Mode Cnt Score Error Units (baseline)
>> -StringUpperLower.lowerToLower avgt 15 27.180 ± 0.017 ns/op
>> -StringUpperLower.lowerToUpper avgt 15 47.196 ± 0.066 ns/op
>> -StringUpperLower.mixedToLower avgt 15 32.307 ± 0.072 ns/op
>> -StringUpperLower.mixedToUpper avgt 15 44.005 ± 0.414 ns/op
>> -StringUpperLower.upperToLower avgt 15 32.310 ± 0.033 ns/op
>> -StringUpperLower.upperToUpper avgt 15 42.053 ± 0.341 ns/op
>>
>> +Benchmark Mode Cnt Score Error Units (Update 01)
>> +StringUpperLower.lowerToLower avgt 15 16.964 ± 0.021 ns/op (+60.96%)
>> +StringUpperLower.lowerToUpper avgt 15 46.491 ± 0.036 ns/op (+1.51%)
>> +StringUpperLower.mixedToLower avgt 15 35.947 ± 0.254 ns/op (-10.12%)
>> +StringUpperLower.mixedToUpper avgt 15 41.976 ± 0.326 ns/op (+4.83%)
>> +StringUpperLower.upperToLower avgt 15 33.466 ± 4.036 ns/op (-3.45%)
>> +StringUpperLower.upperToUpper avgt 15 17.446 ± 1.036 ns/op (+141.04%)
>>
>> +Benchmark Mode Cnt Score Error Units (Update 00)
>> +StringUpperLower.lowerToLower avgt 15 16.976 ± 0.043 ns/op (+60.160)
>> +StringUpperLower.lowerToUpper avgt 15 46.373 ± 0.086 ns/op (+1.77%)
>> +StringUpperLower.mixedToLower avgt 15 32.018 ± 0.061 ns/op (+0.9%)
>> +StringUpperLower.mixedToUpper avgt 15 42.019 ± 0.473 ns/op (+4.72%)
>> +StringUpperLower.upperToLower avgt 15 32.052 ± 0.051 ns/op (+0.8%)
>> +StringUpperLower.upperToUpper avgt 15 16.978 ± 0.190 ns/op (+147.69%)
>>
>>
>> ## 2. [aliyun_ecs_c8a.xlarge](https://help.aliyun.com/document_detail/25378.html#c8a)
>> * cpu : amd epc genoa (x64)
>>
>> ``` diff
>> -Benchmark Mode Cnt Score Error Units (baseline)
>> -StringUpperLower.lowerToLower avgt 15 22.164 ± 0.021 ns/op
>> -StringUpperLower.lowerToUpper avgt 15 46.113 ± 0.047 ns/op
>> -StringUpperLower.mixedToLower avgt 15 28.501 ± 0.037 ns/op
>> -StringUpperLower.mixedToUpper avgt 15 38.782 ± 0.038 ns/op
>> -StringUpperLower.upperToLower avgt 15 28.625 ± 0.162 ns/op
>> -StringUpperLower.upperToUpper avgt 15 27.960 ± 0.038 ns/op
>>
>> +B...
>
> 温绍锦 has updated the pull request incrementally with one additional commit since the last revision:
>
> add method CharacterDataLatin1#isLowerCaseEx
I think this overall looks reasonable, but I think a more thorough proof / test would help to build confidence that all these changes are semantically neutral.
The `isLowerCaseEx` needs to explain why two lower-case codepoints are omitted (perhaps this should be `hasUpperCase`?)
src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template line 94:
> 92:
> 93: boolean isLowerCaseEx(int ch) {
> 94: return ch >= 'a' && (ch <= 'z' || ch == 181 || (ch >= 223 && ch != 247));
What is the contract for this? Specifically there are two special superscripte codepoints (170 and 186) which are lower-case (`Character.isLowerCase(170) => true`) but doesn't have an upper-case (`Character.toUpperCase(170) => 170`). It seems reasonable to exclude them if only used for operations like toUpper/toLower (since they won't change), but it should be spelled out to avoid surprises.
For consistency I think we should use hex literals in this file, e.g. `0xDF` instead of `223`
-------------
Changes requested by redestad (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/14751#pullrequestreview-1604562360
PR Review Comment: https://git.openjdk.org/jdk/pull/14751#discussion_r1311501588
More information about the core-libs-dev
mailing list