String.equalsIgnoreCase(...) optimization

Sun May 1 22:52:12 UTC 2016

Hello!

I extract JDK 9 code from String, StringLatin1, StringUTF16 and optimize some functions https://github.com/volodin-aa/openjdk-benchmark

My optimization:

1. Create fullMatchesCI() methods with simple for (...) loop https://github.com/volodin-aa/openjdk-benchmark/blob/master/src/su/tweak/openjdk/FastStringUTF16.java#L101
2. Remove 'constants' for equalsIgnoreCase() https://github.com/volodin-aa/openjdk-benchmark/blob/master/src/su/tweak/openjdk/FastString.java#L213
3. Optimize upper case verifying http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/56379812ec5e/src/java.base/share/classes/java/lang/StringUTF16.java#l549
Character.toUpperCase(c2) more expencive then simple if (...) and I write if (u1 == c2) instead if (u1 == u2)
...
			// Try convert c1 to upper case and compare with c2
			char u1 = Character.toUpperCase(c1);
			if (u1 == c2) {
				continue;
			}

			// Try convert c2 to upper case and compare with u1
			char u2 = Character.toUpperCase(c2);
			if (u1 == u2) {
				continue;
			}
...
For 'random case' strings it is more efficient. 
And developers can optimeze code like this http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/56379812ec5e/src/java.base/share/classes/java/lang/Boolean.java#l130

 public static boolean parseBoolean(String s) {
        return "true".equalsIgnoreCase(s);
 }

My results https://docs.google.com/spreadsheets/d/1_Fxky6ftZTHpyuRJbM9wvqOzpoORLpIeciukMWuYuXw/pubhtml
In table I compute equalsIgnoreCase() time without initialisation time.
For case "string" and  "String" my version more faster then internalVM version. In other cases VM optimization is faster.

27.04.2016, 14:38, "Aleksey Shipilev" <aleksey.shipilev at oracle.com>:
>  Hi Andrey,
>
>  On 04/27/2016 12:57 PM, Andrey wrote:
>>   I publish my JMH benchmark at github https://github.com/volodin-aa/openjdk-benchmark
>
>  Please note that you really should compete with JDK 9 String, not with
>  JDK 8. String.equalsIgnoreCase is different in JDK 9, and the obvious
>  improvement one can do there is:
>
>  diff -r 5a6df35b0f97 src/java.base/share/classes/java/lang/StringLatin1.java
>  --- a/src/java.base/share/classes/java/lang/StringLatin1.java Wed Apr 27
>  09:13:51 2016 +0200
>  +++ b/src/java.base/share/classes/java/lang/StringLatin1.java Wed Apr 27
>  14:16:14 2016 +0300
>  @@ -315,11 +315,14 @@
>                                             byte[] other, int ooffset,
>  int len) {
>           int last = toffset + len;
>           while (toffset < last) {
>  - char c1 = (char)(value[toffset++] & 0xff);
>  - char c2 = (char)(other[ooffset++] & 0xff);
>  - if (c1 == c2) {
>  + byte b1 = value[toffset++];
>  + byte b2 = other[ooffset++];
>  + if (b1 == b2) {
>                   continue;
>               }
>  + char c1 = (char)(b1 & 0xff);
>  + char c2 = (char)(b2 & 0xff);
>  +
>               char u1 = Character.toUpperCase(c1);
>               char u2 = Character.toUpperCase(c2);
>               if (u1 == u2) {
>
>  ...which improves performance even on short Strings. Maybe specializing
>  Character.toUpperCase for Latin1 strings would pay off more. Maybe
>  specializing regionMatches for complete Strings would worth the
>  increased complexity too, but that should be tried on JDK code first.
>
>  To make a good justification for the change, the benchmark should really
>  test:
>   a) Different lengths;
>   b) Different starting mismatch offset;
>   c) Different Latin1/UTF16 pairs.
>
>  Note well: these optimizations would require studying the generated code
>  looking for the compiler quirks, and as such would require much more
>  time than anyone thinks it would take (even after taking Hofstadter's
>  Law into account).
>
>  Thanks,
>  -Aleksey