RFR: 8263091: Remove CharacterData.isOtherUppercase/-Lowercase

Claes Redestad

5 Mar 2021 5 Mar '21

4:02 p.m.

This patch removes the CharacterData.isOtherUppercase and isOtherLowercase methods. It also exploits the fact that isOtherUppercase is always false for all codepoints in the CharacterDataLatin1 range for a small speed-up. I have no means to test if this is correct on PPC, which has intrinsics for isLowerCase/isUpperCase, but unless I'm reading the code wrong the intrinsic for isLowerCase on PPC already appears to effectively do the fused logic of isLowerCase(ch) || isOtherLowerCase(ch) since it handles the two values where isLowerCase and isOtherLowercase disagrees (0xaa, 0xba), which means this change should make the intrinsic and the java code be in better agreement. ------------- Commit messages: - Fold CharacterData.isOtherUpper-/Lowercase into isUpper-/LowerCase Changes: https://git.openjdk.java.net/jdk/pull/2846/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2846&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8263091 Stats: 98 lines in 8 files changed: 1 ins; 71 del; 26 mod Patch: https://git.openjdk.java.net/jdk/pull/2846.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2846/head:pull/2846 PR: https://git.openjdk.java.net/jdk/pull/2846

Show replies by date

Claes Redestad

5 Mar 5 Mar

4:02 p.m.

On Fri, 5 Mar 2021 14:24:34 GMT, Claes Redestad <redestad@openjdk.org> wrote:

...

This patch removes the CharacterData.isOtherUppercase and isOtherLowercase methods. It also exploits the fact that isOtherUppercase is always false for all codepoints in the CharacterDataLatin1 range for a small speed-up.

I have no means to test if this is correct on PPC, which has intrinsics for isLowerCase/isUpperCase, but unless I'm reading the code wrong the intrinsic for isLowerCase on PPC already appears to effectively do the fused logic of isLowerCase(ch) || isOtherLowerCase(ch) since it handles the two values where isLowerCase and isOtherLowercase disagrees (0xaa, 0xba), which means this change should make the intrinsic and the java code be in better agreement.

Microbenchmark results, org.openjdk.bench.java.lang.Characters OOTB: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isLowerCase 176 avgt 5 13.812 ± 0.060 ns/op Characters.isUpperCase 176 avgt 5 13.812 ± 0.062 ns/op Benchmark (codePoint) Mode Cnt Score Error Units Characters.isLowerCase 176 avgt 5 13.825 ± 0.096 ns/op Characters.isUpperCase 176 avgt 5 12.555 ± 0.033 ns/op ~1.1x speed-up for Character.isUpperCase in the latin 1 range -Xint: Benchmark (codePoint) Mode Cnt Score Error Units Characters.isLowerCase 176 avgt 5 754.756 ± 20.815 ns/op Characters.isUpperCase 176 avgt 5 755.403 ± 15.645 ns/op Benchmark (codePoint) Mode Cnt Score Error Units Characters.isLowerCase 176 avgt 5 606.923 ± 1.569 ns/op Characters.isUpperCase 176 avgt 5 521.073 ± 7.439 ns/op 1.25x speed-up for isLowerCase and 1.45x speed-up for isUpperCase when interpreting, translating to minor startup / warmup win on some examined apps. ------------- PR: https://git.openjdk.java.net/jdk/pull/2846

Roger Riggs

5:22 p.m.

On Fri, 5 Mar 2021 14:24:34 GMT, Claes Redestad <redestad@openjdk.org> wrote:

...

This patch removes the CharacterData.isOtherUppercase and isOtherLowercase methods. It also exploits the fact that isOtherUppercase is always false for all codepoints in the CharacterDataLatin1 range for a small speed-up.

I have no means to test if this is correct on PPC, which has intrinsics for isLowerCase/isUpperCase, but unless I'm reading the code wrong the intrinsic for isLowerCase on PPC already appears to effectively do the fused logic of isLowerCase(ch) || isOtherLowerCase(ch) since it handles the two values where isLowerCase and isOtherLowercase disagrees (0xaa, 0xba), which means this change should make the intrinsic and the java code be in better agreement.

Marked as reviewed by rriggs (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2846

Naoto Sato

5:56 p.m.

On Fri, 5 Mar 2021 14:24:34 GMT, Claes Redestad <redestad@openjdk.org> wrote:

...

This patch removes the CharacterData.isOtherUppercase and isOtherLowercase methods. It also exploits the fact that isOtherUppercase is always false for all codepoints in the CharacterDataLatin1 range for a small speed-up.

I have no means to test if this is correct on PPC, which has intrinsics for isLowerCase/isUpperCase, but unless I'm reading the code wrong the intrinsic for isLowerCase on PPC already appears to effectively do the fused logic of isLowerCase(ch) || isOtherLowerCase(ch) since it handles the two values where isLowerCase and isOtherLowercase disagrees (0xaa, 0xba), which means this change should make the intrinsic and the java code be in better agreement.

Marked as reviewed by naoto (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2846

Iris Clark

8:42 p.m.

On Fri, 5 Mar 2021 14:24:34 GMT, Claes Redestad <redestad@openjdk.org> wrote:

...

This patch removes the CharacterData.isOtherUppercase and isOtherLowercase methods. It also exploits the fact that isOtherUppercase is always false for all codepoints in the CharacterDataLatin1 range for a small speed-up.

I have no means to test if this is correct on PPC, which has intrinsics for isLowerCase/isUpperCase, but unless I'm reading the code wrong the intrinsic for isLowerCase on PPC already appears to effectively do the fused logic of isLowerCase(ch) || isOtherLowerCase(ch) since it handles the two values where isLowerCase and isOtherLowercase disagrees (0xaa, 0xba), which means this change should make the intrinsic and the java code be in better agreement.

Marked as reviewed by iris (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2846

Claes Redestad

8 Mar 8 Mar

10:38 a.m.

New subject: Integrated: 8263091: Remove CharacterData.isOtherUppercase/-Lowercase

On Fri, 5 Mar 2021 14:24:34 GMT, Claes Redestad <redestad@openjdk.org> wrote:

...

This patch removes the CharacterData.isOtherUppercase and isOtherLowercase methods. It also exploits the fact that isOtherUppercase is always false for all codepoints in the CharacterDataLatin1 range for a small speed-up.

I have no means to test if this is correct on PPC, which has intrinsics for isLowerCase/isUpperCase, but unless I'm reading the code wrong the intrinsic for isLowerCase on PPC already appears to effectively do the fused logic of isLowerCase(ch) || isOtherLowerCase(ch) since it handles the two values where isLowerCase and isOtherLowercase disagrees (0xaa, 0xba), which means this change should make the intrinsic and the java code be in better agreement.

This pull request has now been integrated. Changeset: a0c3f242 Author: Claes Redestad <redestad@openjdk.org> URL: https://git.openjdk.java.net/jdk/commit/a0c3f242 Stats: 98 lines in 8 files changed: 1 ins; 71 del; 26 mod 8263091: Remove CharacterData.isOtherUppercase/-Lowercase Reviewed-by: rriggs, naoto, iris ------------- PR: https://git.openjdk.java.net/jdk/pull/2846

1828

Age (days ago)

1831

Last active (days ago)

List overview

Download

5 comments

4 participants

participants (4)

Claes Redestad
Iris Clark
Naoto Sato
Roger Riggs