String.equalsIgnoreCase(...) optimization
Andrey
andrey at tweak.su
Sun May 1 22:52:12 UTC 2016
Hello!
I extract JDK 9 code from String, StringLatin1, StringUTF16 and optimize some functions https://github.com/volodin-aa/openjdk-benchmark
My optimization:
1. Create fullMatchesCI() methods with simple for (...) loop https://github.com/volodin-aa/openjdk-benchmark/blob/master/src/su/tweak/openjdk/FastStringUTF16.java#L101
2. Remove 'constants' for equalsIgnoreCase() https://github.com/volodin-aa/openjdk-benchmark/blob/master/src/su/tweak/openjdk/FastString.java#L213
3. Optimize upper case verifying http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/56379812ec5e/src/java.base/share/classes/java/lang/StringUTF16.java#l549
Character.toUpperCase(c2) more expencive then simple if (...) and I write if (u1 == c2) instead if (u1 == u2)
...
// Try convert c1 to upper case and compare with c2
char u1 = Character.toUpperCase(c1);
if (u1 == c2) {
continue;
}
// Try convert c2 to upper case and compare with u1
char u2 = Character.toUpperCase(c2);
if (u1 == u2) {
continue;
}
...
For 'random case' strings it is more efficient.
And developers can optimeze code like this http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/56379812ec5e/src/java.base/share/classes/java/lang/Boolean.java#l130
public static boolean parseBoolean(String s) {
return "true".equalsIgnoreCase(s);
}
My results https://docs.google.com/spreadsheets/d/1_Fxky6ftZTHpyuRJbM9wvqOzpoORLpIeciukMWuYuXw/pubhtml
In table I compute equalsIgnoreCase() time without initialisation time.
For case "string" and "String" my version more faster then internalVM version. In other cases VM optimization is faster.
27.04.2016, 14:38, "Aleksey Shipilev" <aleksey.shipilev at oracle.com>:
> Hi Andrey,
>
> On 04/27/2016 12:57 PM, Andrey wrote:
>> I publish my JMH benchmark at github https://github.com/volodin-aa/openjdk-benchmark
>
> Please note that you really should compete with JDK 9 String, not with
> JDK 8. String.equalsIgnoreCase is different in JDK 9, and the obvious
> improvement one can do there is:
>
> diff -r 5a6df35b0f97 src/java.base/share/classes/java/lang/StringLatin1.java
> --- a/src/java.base/share/classes/java/lang/StringLatin1.java Wed Apr 27
> 09:13:51 2016 +0200
> +++ b/src/java.base/share/classes/java/lang/StringLatin1.java Wed Apr 27
> 14:16:14 2016 +0300
> @@ -315,11 +315,14 @@
> byte[] other, int ooffset,
> int len) {
> int last = toffset + len;
> while (toffset < last) {
> - char c1 = (char)(value[toffset++] & 0xff);
> - char c2 = (char)(other[ooffset++] & 0xff);
> - if (c1 == c2) {
> + byte b1 = value[toffset++];
> + byte b2 = other[ooffset++];
> + if (b1 == b2) {
> continue;
> }
> + char c1 = (char)(b1 & 0xff);
> + char c2 = (char)(b2 & 0xff);
> +
> char u1 = Character.toUpperCase(c1);
> char u2 = Character.toUpperCase(c2);
> if (u1 == u2) {
>
> ...which improves performance even on short Strings. Maybe specializing
> Character.toUpperCase for Latin1 strings would pay off more. Maybe
> specializing regionMatches for complete Strings would worth the
> increased complexity too, but that should be tried on JDK code first.
>
> To make a good justification for the change, the benchmark should really
> test:
> a) Different lengths;
> b) Different starting mismatch offset;
> c) Different Latin1/UTF16 pairs.
>
> Note well: these optimizations would require studying the generated code
> looking for the compiler quirks, and as such would require much more
> time than anyone thinks it would take (even after taking Hofstadter's
> Law into account).
>
> Thanks,
> -Aleksey
More information about the core-libs-dev
mailing list