<i18n dev> RFR: 8302877: Speed up latin1 case conversions [v2]

Claes Redestad redestad at openjdk.org
Tue Feb 21 10:40:33 UTC 2023


On Tue, 21 Feb 2023 06:59:47 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:

>> This PR suggests we speed up Character.toUpperCase and Character.toLowerCase for latin1 code points by applying the 'oldest ASCII trick in the book'.
>> 
>> This takes advantage of the fact that latin1 uppercase code points are always 0x20 lower than their lowercase (with the exception of two code points which uppercase out of latin1).
>> 
>> To verify the correctness of the new implementation, the test `Latin1CaseConversion` is added with an exhaustive verification of toUpperCase/toLowerCase for all latin1 code points.
>> 
>> The implementation needs to balance the performance of the various ranges in latin1. An effort has been made to favour operations on ASCII code points, without causing excessive regression for higher code points.
>> 
>> Performance is benchmarked for 7 chosen sample code points, each representing a range or a special-case.  Results in the first comment.
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Spell fix for 'exhaustive' in comments in sun/text/resources

Looks good. Some nits inline

src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template line 142:

> 140:         }
> 141:         int l = ch | 0x20; // Lowercase using 'oldest ASCII trick in the book'
> 142:         if ( l <= 'z' // In range a-z

Suggestion:

        if (l <= 'z' // In range a-z

test/micro/org/openjdk/bench/java/lang/Characters.java line 92:

> 90:     @Measurement(iterations = 5, time = 1)
> 91:     @Fork(3)
> 92:     public static class Latin1CaseConversions {

Not sure if qualifying this as "Latin1" is necessary, even though that's what you've focused on for this PR. We could easily add some codePoints outside of the latin1 range (now or later) without changing the test. 

While having a switch with some readable names is a nice touch I think we should additionally allow integer codePoint as-is to keep it in line with the outer class, e.g. `default -> Integer.parseInt(codePoint);`

-------------

Marked as reviewed by redestad (Reviewer).

PR: https://git.openjdk.org/jdk/pull/12623


More information about the i18n-dev mailing list