Hooking up the array mismatch stub as an intrinsic in the template interpreter
Paul Sandoz
paul.sandoz at oracle.com
Fri Apr 15 15:06:27 UTC 2016
> On 15 Apr 2016, at 15:39, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>
>
>
> On 4/15/16 4:29 PM, Vladimir Ivanov wrote:
>> An idea how to avoid interpreter changes.
>>
>> Interpreter can't benefit from "intrinsifiable" methods directly, but if
>> you create a wrapper and call it instead [1], JIT-compilers can take
>> care of stand-alone versions for you. The interpreter will work with
>> them as if they are ordinary Java methods.
>
> ... or even add such logic directly into the JVM: for methods marked w/ @HotSpotIntrinsicCandidate (or better with some new annotation, since most intrinsics depend on the context they are invoked in) create an intrinsified stand-alone version.
>
Very interesting, some good lateral thinking here!
Thanks,
Paul.
> Best regards,
> Vladimir Ivanov
>
>>
>> The only missing case is early startup phase when everything is
>> interpreted, but we can add a special logic in the JVM to eagerly
>> compile such methods (either during startup or on the first invocation)
>> which would be much simpler than adding intrinsics specifically for the
>> interpreter.
>>
>> [1]
>> diff --git a/src/java.base/share/classes/java/util/ArraysSupport.java
>> b/src/java.base/share/classes/java/util/ArraysSupport.java
>> --- a/src/java.base/share/classes/java/util/ArraysSupport.java
>> +++ b/src/java.base/share/classes/java/util/ArraysSupport.java
>> @@ -26,6 +26,7 @@
>>
>> import jdk.internal.HotSpotIntrinsicCandidate;
>> import jdk.internal.misc.Unsafe;
>> +import jdk.internal.vm.annotation.ForceInline;
>>
>> /**
>> * Utility methods to find a mismatch between two primitive arrays.
>> @@ -106,8 +107,16 @@
>> * compliment of the number of remaining pairs of elements to be
>> checked in
>> * the tail of the two arrays.
>> */
>> + @ForceInline
>> + static int vectorizedMismatch(Object a, long aOffset,
>> + Object b, long bOffset,
>> + int length,
>> + int log2ArrayIndexScale) {
>> + return vectorizedMismatch0(a, aOffset, b, bOffset, length,
>> log2ArrayIndexScale);
>> + }
>> +
>> @HotSpotIntrinsicCandidate
>> - static int vectorizedMismatch(Object a, long aOffset,
>> + private static int vectorizedMismatch0(Object a, long aOffset,
>> Object b, long bOffset,
>> int length,
>> int log2ArrayIndexScale) {
>>
>> On 4/15/16 4:07 PM, Paul Sandoz wrote:
>>>
>>>> On 15 Apr 2016, at 14:12, Coleen Phillimore
>>>> <coleen.phillimore at oracle.com> wrote:
>>>>
>>>>
>>>> I don't know why we'd add even more assembly code to the
>>>> interpreter. Why doesn't the JIT optimize this function instead? By
>>>> adding a stub in the interpreter does that prevent the JIT from
>>>> inlining this function since it's not invocation counted?
>>>>
>>>
>>> I have updated the webrev with C1 support [1] and determined,
>>> eyeballing generated code, that the stub call gets inlined for C1 and
>>> C2 and appears unaffected by the wiring up of that same stub in the
>>> template interpreter.
>>>
>>> A stub was added and wired up to C2 with the intention to wire that up
>>> to C1, and possible to the interpreter. One reason for the latter was
>>> because of the performance results presented in the last email
>>> (potentially ~200x over the current approach, and ~35x improvement
>>> over the original Java code). Does that matter? would you be concerned
>>> about that?
>>>
>>> Array equality is quite a fundamental operation so i was concerned
>>> about such a regression in the interpreter.
>>>
>>> Another reason for the latter, which i may be off base on here, is it
>>> might make it easier to consolidate the intrinsics added for compact
>>> string equality/comparison to this more general mismatch functionality.
>>>
>>> —
>>>
>>> Regarding the changes to C1 in [1]. Like for the CRC intrinsics i
>>> added the _vectorizedMismatch intrinsic to the set of intrinsics that
>>> preserve state and can trap. Is that correct? Also i am not sure if
>>> the 32-bit part is correct.
>>>
>>> Thanks,
>>> Paul.
>>>
>>> [1]
>>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/
>>>
>>> (Note: this is still incomplete i need to appropriately update all
>>> CPU-based code.)
>>>
>>> Benchmark (lastNEQ) (n) Mode Cnt Score Error
>>> Units
>>> # Baseline
>>> # VM options: -XX:TieredStopAtLevel=1
>>> ByteArray.base_equals false 1024 avgt 10 1190.177 ± 21.387
>>> ns/op
>>> ByteArray.base_equals true 1024 avgt 10 1191.767 ± 35.196
>>> ns/op
>>>
>>> # Before patch
>>> # VM options: -XX:TieredStopAtLevel=1 -XX:-SpecialArraysEquals
>>> -XX:-UseVectorizedMismatchIntrinsic
>>> ByteArray.jdk_equals false 1024 avgt 10 208.014 ± 5.224
>>> ns/op
>>> ByteArray.jdk_equals true 1024 avgt 10 218.271 ± 10.749
>>> ns/op
>>>
>>> # After patch
>>> # VM options: -XX:TieredStopAtLevel=1 -XX:-SpecialArraysEquals
>>> -XX:+UseVectorizedMismatchIntrinsic
>>> ByteArray.jdk_equals false 1024 avgt 10 70.097 ± 2.321
>>> ns/op
>>> ByteArray.jdk_equals true 1024 avgt 10 72.284 ± 1.578
>>> ns/op
>>>
>>>
>>>
>>>> thanks,
>>>> Coleen
>>>>
>>>>
>>>> On 4/14/16 10:53 AM, Paul Sandoz wrote:
>>>>> Hi,
>>>>>
>>>>> I hooked up the array mismatch stub to the interpreter, with a bit
>>>>> of code cargo culting the CRC work and some lldb debugging [*] it
>>>>> appears to work and pass tests.
>>>>>
>>>>> Can someone have a quick look to see if i am not the right track here:
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/
>>>>> <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/>
>>>>>
>>>>>
>>>>>
>>>>> Here are some quick numbers running using -Xint for byte[] equality:
>>>>>
>>>>> Benchmark (lastNEQ) (n) Mode Cnt Score
>>>>> Error Units
>>>>> # Baseline
>>>>> # VM options: -Xint
>>>>> ByteArray.base_equals false 1024 avgt 10 16622.453 ±
>>>>> 498.475 ns/op
>>>>> ByteArray.base_equals true 1024 avgt 10 16889.244 ±
>>>>> 439.895 ns/op
>>>>>
>>>>> # Before patch
>>>>> # VM options: -Xint -XX:-UseVectorizedMismatchIntrinsic
>>>>> ByteArray.jdk_equals false 1024 avgt 10 106436.195 ±
>>>>> 3657.508 ns/op
>>>>> ByteArray.jdk_equals true 1024 avgt 10 103306.001 ±
>>>>> 2723.130 ns/op
>>>>>
>>>>> # After patch
>>>>> # VM options: -Xint -XX:+UseVectorizedMismatchIntrinsic
>>>>> ByteArray.jdk_equals false 1024 avgt 10 448.764 ±
>>>>> 18.977 ns/op
>>>>> ByteArray.jdk_equals true 1024 avgt 10 448.657 ±
>>>>> 22.656 ns/op
>>>>>
>>>>>
>>>>>
>>>>> The next step is to wire up C1.
>>>>>
>>>>> Further steps would be to substitute some of intrinsics added/used
>>>>> for compact strings with mismatch, then evaluate the performance.
>>>>>
>>>>> Thanks,
>>>>> Paul.
>>>>>
>>>>> [*] Stubs to be used as intrinsics in the template interpreter need
>>>>> to be created during the initial stage of generation, otherwise the
>>>>> stub address is null which leads to a SEGV that’s hard to track down.
>>>>
>>>
More information about the hotspot-dev
mailing list