Missing trivial optimization in String equals

charlie hunt charlie.hunt at oracle.com
Wed Jan 24 13:37:04 UTC 2018


Hi Martin,

In short, the string equals methods for StringLatin1 and StringUTF16 methods have instrinsics that compare their respective array lengths.

Long strong, there was an enormous amount of optimization work (many intrinsics) implemented and re-worked on the String object for the Compact Strings feature (http://openjdk.java.net/jeps/254 <http://openjdk.java.net/jeps/254>) in JDK 9.
To drive that optimization work, there was a lot of String microbenchmarks and analysis done. You can find performance analysis information on String.equals() at http://cr.openjdk.java.net/~shade/density/equals.txt <http://cr.openjdk.java.net/~shade/density/equals.txt>.

hths,

charlie

> On Jan 23, 2018, at 7:48 PM, Martin Grajcar <maaartinus at gmail.com> wrote:
> 
> Hi!
> 
> On Tue, Jan 23, 2018 at 1:13 AM, Claes Redestad <claes.redestad at oracle.com>
> wrote:
> 
>> Hi,
>> 
>> do you have an example application that is spending significant time
>> comparing the contents of such identical arrays and would benefit?
> 
> 
> no, it was just a random idea.
> 
> I would
>> guess such comparisons are about twice as fast as typical ones due to
>> having to load half as much from memory. ;-)
>> 
> 
> Actually, no data has to be loaded if the arrays are the same object.
> It's simple O(1) test, which may or may not save an O(n) test.
> 
> 
>> If there's a valid optimization case, though, then it would only be
>> realizable when UseStringDeduplication is enabled. Thus it's crucial not
>> to regress all the other cases that won't benefit (Parallel, Serial, ...),
>> and as String::equals is often a very hot method then even a few added
>> instructions that only compare data we've already loaded into registers
>> might be too much. Or maybe not.
>> 
> 
> Sure, without deduplication switched on, the test is completely pointless.
> Otherwise, the more deduplication gets done, the more useful is may get.
> 
> As String::equals is routinely replaced by a C2 intrinsic[1] that's
>> where we should look to optimize (not so much in the JDK library code).
>> 
> 
> Thank you, I knew, it happens, but didn't know where.
> However, it's the @HotSpotIntrinsicCandidate methods in StringLatin1
> and StringUTF16, what gets intrinsified,
> but I'd suggest to put the test just before the coder comparison at line
> http://hg.openjdk.java.net/jdk10/jdk10/jdk/file/777356696811/src/java.base/
> share/classes/java/lang/String.java#l1020
> in order to maximize the possible gain.
> There's no reason to test the coder when the arrays happen to be the same
> object
> as this is only possible due to the deduplication, which tests it already.
> 
> 
>> I think it'd be straightforward to train that intrinsic to only emit the
>> test when the related feature is enabled. Maybe the kind folks over at
>> hotspot-compiler-dev at openjdk.java.net might be able to give some pointers.
>> If so I'd invite you to set up a set of experiments and see if this can
>> have a benefit.
>> 
> 
> Because of the intrinsic being tied to the concrete methods rather than to
> the code inside,
> I'd have to modify and compile the JVM sources, which needs way more time
> than I can spend on it in the moment. :(
> A benchmark written using a copy of the method would be too unrealistic,
> as the copied comparison uses no intrinsic and is much slower (
> https://stackoverflow.com/q/15586997/581205).
> 
> Anyway, we know, that an arbitrary high speed up is possible as the O(n)
> comparison gets eliminated.
> A benchmark showing this can surely be constructed,
> but what has to be measured is the tiny slowdown in cases when it doesn't
> help
> and that needs to re-run existing benchmarks.
> 
> Actually, I'd propose something more fancy (added two more independent
> trivial optimization ideas):
> 
>    public boolean equals(Object anObject) {
>        if (this == anObject) {
>            return true;
>        }
>        if (anObject instanceof String) {
>            String aString = (String)anObject;
> 
>            // IDEA 1 - as discussed above
>            // USE_STRING_DEDUPLICATION is a constant to be eliminated by
> JIT
>            // just like it happens with COMPACT_STRINGS [2].
>            if (USE_STRING_DEDUPLICATION && value == aString.value) {
>                // We know that the coders are the same here.
>                return true;
>            }
> 
>            // IDEA 2 - add a trivial test possibly avoiding the loop
>            // Testing the length first because of it being more probable
> to differ.
>            if (value.length == aString.value.length && coder() ==
> aString.coder()) {
> 
>                // IDEA 3 - remove a needless test of coder()
>                // Not testing the coder as all we need is byte array
> content equality.
>                // StringLatin1.equals is applicable for UTF-8, too.
>                return StringLatin1.equals(value, aString.value);
>            }
>        }
>        return false;
>    }
> 
> Regards,
> Martin.
> 
> [2]:
> http://hg.openjdk.java.net/jdk10/jdk10/jdk/file/777356696811/src/java.base/share/classes/java/lang/String.java#l197
> 
> Regards
>> 
>> /Claes
>> 
>> [1]
>> http://hg.openjdk.java.net/jdk/jdk/file/e1876e6b57b6/src/hot
>> spot/share/opto/library_call.cpp#l1116
>> 
>> 
>> 
>> On 2018-01-23 00:14, Martin Grajcar wrote:
>> 
>>> It looks like when testing equality of Strings, there's no test if the two
>>> arrays are actually the same object. This was impossible before Java 8u20
>>> (as no two strings used to share arrays), but it's well possible with
>>> -XX:+UseStringDeduplication.
>>> 
>>> Maybe someone here cares...
>>> 
>>> Regards,
>>> Martin.
>>> 
>>> http://hg.openjdk.java.net/jdk10/jdk10/jdk/file/777356696811
>>> /src/java.base/
>>> share/classes/java/lang/String.java#l1014
>>> http://hg.openjdk.java.net/jdk10/jdk10/jdk/file/777356696811
>>> /src/java.base/
>>> share/classes/java/lang/StringLatin1.java#l89
>>> http://hg.openjdk.java.net/jdk10/jdk10/jdk/file/777356696811
>>> /src/java.base/
>>> share/classes/java/lang/StringUTF16.java#l258
>>> 
>> 
>> 



More information about the jdk-dev mailing list