Hooking up the array mismatch stub as an intrinsic in the template interpreter

Coleen Phillimore coleen.phillimore at oracle.com
Fri Apr 15 14:02:16 UTC 2016


Thank you, Vladimir.
Coleen


On 4/15/16 9:29 AM, Vladimir Ivanov wrote:
> An idea how to avoid interpreter changes.
>
> Interpreter can't benefit from "intrinsifiable" methods directly, but 
> if you create a wrapper and call it instead [1], JIT-compilers can 
> take care of stand-alone versions for you. The interpreter will work 
> with them as if they are ordinary Java methods.
>
> The only missing case is early startup phase when everything is 
> interpreted, but we can add a special logic in the JVM to eagerly 
> compile such methods (either during startup or on the first 
> invocation) which would be much simpler than adding intrinsics 
> specifically for the interpreter.
>
> Best regards,
> Vladimir Ivanov
>
> [1]
> diff --git a/src/java.base/share/classes/java/util/ArraysSupport.java 
> b/src/java.base/share/classes/java/util/ArraysSupport.java
> --- a/src/java.base/share/classes/java/util/ArraysSupport.java
> +++ b/src/java.base/share/classes/java/util/ArraysSupport.java
> @@ -26,6 +26,7 @@
>
>  import jdk.internal.HotSpotIntrinsicCandidate;
>  import jdk.internal.misc.Unsafe;
> +import jdk.internal.vm.annotation.ForceInline;
>
>  /**
>   * Utility methods to find a mismatch between two primitive arrays.
> @@ -106,8 +107,16 @@
>       * compliment of the number of remaining pairs of elements to be 
> checked in
>       * the tail of the two arrays.
>       */
> +    @ForceInline
> +    static int vectorizedMismatch(Object a, long aOffset,
> +                                  Object b, long bOffset,
> +                                  int length,
> +                                  int log2ArrayIndexScale) {
> +        return vectorizedMismatch0(a, aOffset, b, bOffset, length, 
> log2ArrayIndexScale);
> +    }
> +
>      @HotSpotIntrinsicCandidate
> -    static int vectorizedMismatch(Object a, long aOffset,
> +    private static int vectorizedMismatch0(Object a, long aOffset,
>                                    Object b, long bOffset,
>                                    int length,
>                                    int log2ArrayIndexScale) {
>
> On 4/15/16 4:07 PM, Paul Sandoz wrote:
>>
>>> On 15 Apr 2016, at 14:12, Coleen Phillimore 
>>> <coleen.phillimore at oracle.com> wrote:
>>>
>>>
>>> I don't know why we'd add even more assembly code to the 
>>> interpreter.  Why doesn't the JIT optimize this function instead? By 
>>> adding a stub in the interpreter does that prevent the JIT from 
>>> inlining this function since it's not invocation counted?
>>>
>>
>> I have updated the webrev with C1 support [1] and determined, 
>> eyeballing generated code, that the stub call gets inlined for C1 and 
>> C2 and appears unaffected by the wiring up of that same stub in the 
>> template interpreter.
>>
>> A stub was added and wired up to C2 with the intention to wire that 
>> up to C1, and possible to the interpreter. One reason for the latter 
>> was because of the performance results presented in the last email 
>> (potentially ~200x over the current approach, and ~35x improvement 
>> over the original Java code). Does that matter? would you be 
>> concerned about that?
>>
>> Array equality is quite a fundamental operation so i was concerned 
>> about such a regression in the interpreter.
>>
>> Another reason for the latter, which i may be off base on here, is it 
>> might make it easier to consolidate the intrinsics added for compact 
>> string equality/comparison to this more general mismatch functionality.
>>
>>>>
>> Regarding the changes to C1 in [1]. Like for the CRC intrinsics i 
>> added the _vectorizedMismatch intrinsic to the set of intrinsics that 
>> preserve state and can trap. Is that correct? Also i am not sure if 
>> the 32-bit part is correct.
>>
>> Thanks,
>> Paul.
>>
>> [1] 
>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/
>> (Note: this is still incomplete i need to appropriately update all 
>> CPU-based code.)
>>
>> Benchmark              (lastNEQ)   (n)  Mode  Cnt     Score Error  Units
>> # Baseline
>> # VM options: -XX:TieredStopAtLevel=1
>> ByteArray.base_equals      false  1024  avgt   10  1190.177 ± 21.387  
>> ns/op
>> ByteArray.base_equals       true  1024  avgt   10  1191.767 ± 35.196  
>> ns/op
>>
>> # Before patch
>> # VM options: -XX:TieredStopAtLevel=1 -XX:-SpecialArraysEquals 
>> -XX:-UseVectorizedMismatchIntrinsic
>> ByteArray.jdk_equals       false  1024  avgt   10   208.014 ± 5.224  
>> ns/op
>> ByteArray.jdk_equals        true  1024  avgt   10   218.271 ± 10.749  
>> ns/op
>>
>> # After patch
>> # VM options: -XX:TieredStopAtLevel=1 -XX:-SpecialArraysEquals 
>> -XX:+UseVectorizedMismatchIntrinsic
>> ByteArray.jdk_equals       false  1024  avgt   10    70.097 ± 2.321  
>> ns/op
>> ByteArray.jdk_equals        true  1024  avgt   10    72.284 ± 1.578  
>> ns/op
>>
>>
>>
>>> thanks,
>>> Coleen
>>>
>>>
>>> On 4/14/16 10:53 AM, Paul Sandoz wrote:
>>>> Hi,
>>>>
>>>> I hooked up the array mismatch stub to the interpreter, with a bit 
>>>> of code cargo culting the CRC work and some lldb debugging [*] it 
>>>> appears to work and pass tests.
>>>>
>>>> Can someone have a quick look to see if i am not the right track here:
>>>>
>>>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/ 
>>>> <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/> 
>>>>
>>>>
>>>>
>>>> Here are some quick numbers running using -Xint for byte[] equality:
>>>>
>>>> Benchmark              (lastNEQ)   (n)  Mode  Cnt Score      Error  
>>>> Units
>>>> # Baseline
>>>> # VM options: -Xint
>>>> ByteArray.base_equals      false  1024  avgt   10  16622.453 ± 
>>>> 498.475  ns/op
>>>> ByteArray.base_equals       true  1024  avgt   10  16889.244 ± 
>>>> 439.895  ns/op
>>>>
>>>> # Before patch
>>>> # VM options: -Xint -XX:-UseVectorizedMismatchIntrinsic
>>>> ByteArray.jdk_equals       false  1024  avgt   10 106436.195 ± 
>>>> 3657.508  ns/op
>>>> ByteArray.jdk_equals        true  1024  avgt   10 103306.001 ± 
>>>> 2723.130  ns/op
>>>>
>>>> # After patch
>>>> # VM options: -Xint -XX:+UseVectorizedMismatchIntrinsic
>>>> ByteArray.jdk_equals       false  1024  avgt   10    448.764 ±  
>>>> 18.977  ns/op
>>>> ByteArray.jdk_equals        true  1024  avgt   10    448.657 ±  
>>>> 22.656  ns/op
>>>>
>>>>
>>>>
>>>> The next step is to wire up C1.
>>>>
>>>> Further steps would be to substitute some of intrinsics added/used 
>>>> for compact strings with mismatch, then evaluate the performance.
>>>>
>>>> Thanks,
>>>> Paul.
>>>>
>>>> [*] Stubs to be used as intrinsics in the template interpreter need 
>>>> to be created during the initial stage of generation, otherwise the 
>>>> stub address is null which leads to a SEGV that’s hard to track down.
>>>
>>



More information about the hotspot-dev mailing list