Hooking up the array mismatch stub as an intrinsic in the template interpreter
Coleen Phillimore
coleen.phillimore at oracle.com
Fri Apr 15 14:02:16 UTC 2016
Thank you, Vladimir.
Coleen
On 4/15/16 9:29 AM, Vladimir Ivanov wrote:
> An idea how to avoid interpreter changes.
>
> Interpreter can't benefit from "intrinsifiable" methods directly, but
> if you create a wrapper and call it instead [1], JIT-compilers can
> take care of stand-alone versions for you. The interpreter will work
> with them as if they are ordinary Java methods.
>
> The only missing case is early startup phase when everything is
> interpreted, but we can add a special logic in the JVM to eagerly
> compile such methods (either during startup or on the first
> invocation) which would be much simpler than adding intrinsics
> specifically for the interpreter.
>
> Best regards,
> Vladimir Ivanov
>
> [1]
> diff --git a/src/java.base/share/classes/java/util/ArraysSupport.java
> b/src/java.base/share/classes/java/util/ArraysSupport.java
> --- a/src/java.base/share/classes/java/util/ArraysSupport.java
> +++ b/src/java.base/share/classes/java/util/ArraysSupport.java
> @@ -26,6 +26,7 @@
>
> import jdk.internal.HotSpotIntrinsicCandidate;
> import jdk.internal.misc.Unsafe;
> +import jdk.internal.vm.annotation.ForceInline;
>
> /**
> * Utility methods to find a mismatch between two primitive arrays.
> @@ -106,8 +107,16 @@
> * compliment of the number of remaining pairs of elements to be
> checked in
> * the tail of the two arrays.
> */
> + @ForceInline
> + static int vectorizedMismatch(Object a, long aOffset,
> + Object b, long bOffset,
> + int length,
> + int log2ArrayIndexScale) {
> + return vectorizedMismatch0(a, aOffset, b, bOffset, length,
> log2ArrayIndexScale);
> + }
> +
> @HotSpotIntrinsicCandidate
> - static int vectorizedMismatch(Object a, long aOffset,
> + private static int vectorizedMismatch0(Object a, long aOffset,
> Object b, long bOffset,
> int length,
> int log2ArrayIndexScale) {
>
> On 4/15/16 4:07 PM, Paul Sandoz wrote:
>>
>>> On 15 Apr 2016, at 14:12, Coleen Phillimore
>>> <coleen.phillimore at oracle.com> wrote:
>>>
>>>
>>> I don't know why we'd add even more assembly code to the
>>> interpreter. Why doesn't the JIT optimize this function instead? By
>>> adding a stub in the interpreter does that prevent the JIT from
>>> inlining this function since it's not invocation counted?
>>>
>>
>> I have updated the webrev with C1 support [1] and determined,
>> eyeballing generated code, that the stub call gets inlined for C1 and
>> C2 and appears unaffected by the wiring up of that same stub in the
>> template interpreter.
>>
>> A stub was added and wired up to C2 with the intention to wire that
>> up to C1, and possible to the interpreter. One reason for the latter
>> was because of the performance results presented in the last email
>> (potentially ~200x over the current approach, and ~35x improvement
>> over the original Java code). Does that matter? would you be
>> concerned about that?
>>
>> Array equality is quite a fundamental operation so i was concerned
>> about such a regression in the interpreter.
>>
>> Another reason for the latter, which i may be off base on here, is it
>> might make it easier to consolidate the intrinsics added for compact
>> string equality/comparison to this more general mismatch functionality.
>>
>> —
>>
>> Regarding the changes to C1 in [1]. Like for the CRC intrinsics i
>> added the _vectorizedMismatch intrinsic to the set of intrinsics that
>> preserve state and can trap. Is that correct? Also i am not sure if
>> the 32-bit part is correct.
>>
>> Thanks,
>> Paul.
>>
>> [1]
>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/
>> (Note: this is still incomplete i need to appropriately update all
>> CPU-based code.)
>>
>> Benchmark (lastNEQ) (n) Mode Cnt Score Error Units
>> # Baseline
>> # VM options: -XX:TieredStopAtLevel=1
>> ByteArray.base_equals false 1024 avgt 10 1190.177 ± 21.387
>> ns/op
>> ByteArray.base_equals true 1024 avgt 10 1191.767 ± 35.196
>> ns/op
>>
>> # Before patch
>> # VM options: -XX:TieredStopAtLevel=1 -XX:-SpecialArraysEquals
>> -XX:-UseVectorizedMismatchIntrinsic
>> ByteArray.jdk_equals false 1024 avgt 10 208.014 ± 5.224
>> ns/op
>> ByteArray.jdk_equals true 1024 avgt 10 218.271 ± 10.749
>> ns/op
>>
>> # After patch
>> # VM options: -XX:TieredStopAtLevel=1 -XX:-SpecialArraysEquals
>> -XX:+UseVectorizedMismatchIntrinsic
>> ByteArray.jdk_equals false 1024 avgt 10 70.097 ± 2.321
>> ns/op
>> ByteArray.jdk_equals true 1024 avgt 10 72.284 ± 1.578
>> ns/op
>>
>>
>>
>>> thanks,
>>> Coleen
>>>
>>>
>>> On 4/14/16 10:53 AM, Paul Sandoz wrote:
>>>> Hi,
>>>>
>>>> I hooked up the array mismatch stub to the interpreter, with a bit
>>>> of code cargo culting the CRC work and some lldb debugging [*] it
>>>> appears to work and pass tests.
>>>>
>>>> Can someone have a quick look to see if i am not the right track here:
>>>>
>>>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/
>>>> <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/>
>>>>
>>>>
>>>>
>>>> Here are some quick numbers running using -Xint for byte[] equality:
>>>>
>>>> Benchmark (lastNEQ) (n) Mode Cnt Score Error
>>>> Units
>>>> # Baseline
>>>> # VM options: -Xint
>>>> ByteArray.base_equals false 1024 avgt 10 16622.453 ±
>>>> 498.475 ns/op
>>>> ByteArray.base_equals true 1024 avgt 10 16889.244 ±
>>>> 439.895 ns/op
>>>>
>>>> # Before patch
>>>> # VM options: -Xint -XX:-UseVectorizedMismatchIntrinsic
>>>> ByteArray.jdk_equals false 1024 avgt 10 106436.195 ±
>>>> 3657.508 ns/op
>>>> ByteArray.jdk_equals true 1024 avgt 10 103306.001 ±
>>>> 2723.130 ns/op
>>>>
>>>> # After patch
>>>> # VM options: -Xint -XX:+UseVectorizedMismatchIntrinsic
>>>> ByteArray.jdk_equals false 1024 avgt 10 448.764 ±
>>>> 18.977 ns/op
>>>> ByteArray.jdk_equals true 1024 avgt 10 448.657 ±
>>>> 22.656 ns/op
>>>>
>>>>
>>>>
>>>> The next step is to wire up C1.
>>>>
>>>> Further steps would be to substitute some of intrinsics added/used
>>>> for compact strings with mismatch, then evaluate the performance.
>>>>
>>>> Thanks,
>>>> Paul.
>>>>
>>>> [*] Stubs to be used as intrinsics in the template interpreter need
>>>> to be created during the initial stage of generation, otherwise the
>>>> stub address is null which leads to a SEGV that’s hard to track down.
>>>
>>
More information about the hotspot-dev
mailing list