Hooking up the array mismatch stub as an intrinsic in the template interpreter

Coleen Phillimore coleen.phillimore at oracle.com
Fri Apr 15 14:01:47 UTC 2016



On 4/15/16 9:07 AM, Paul Sandoz wrote:
>> On 15 Apr 2016, at 14:12, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
>>
>>
>> I don't know why we'd add even more assembly code to the interpreter.  Why doesn't the JIT optimize this function instead? By adding a stub in the interpreter does that prevent the JIT from inlining this function since it's not invocation counted?
>>
> I have updated the webrev with C1 support [1] and determined, eyeballing generated code, that the stub call gets inlined for C1 and C2 and appears unaffected by the wiring up of that same stub in the template interpreter.
>
> A stub was added and wired up to C2 with the intention to wire that up to C1, and possible to the interpreter. One reason for the latter was because of the performance results presented in the last email (potentially ~200x over the current approach, and ~35x improvement over the original Java code). Does that matter? would you be concerned about that?

What workload is this running?

What results do you get with refworkload?
> Array equality is quite a fundamental operation so i was concerned about such a regression in the interpreter.

The interpreter is mostly run during startup time so we'd like to see 
some workload results with perhaps the startup_3 benchmarks set.

Again, we are trying to not have special case assembly code in the 
interpreter.   Adding these sorts of special optimizations to the 
compilers makes a lot more sense.

Coleen
>
> Another reason for the latter, which i may be off base on here, is it might make it easier to consolidate the intrinsics added for compact string equality/comparison to this more general mismatch functionality.
>
>>
> Regarding the changes to C1 in [1]. Like for the CRC intrinsics i added the _vectorizedMismatch intrinsic to the set of intrinsics that preserve state and can trap. Is that correct? Also i am not sure if the 32-bit part is correct.
>
> Thanks,
> Paul.
>
> [1] http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/
> (Note: this is still incomplete i need to appropriately update all CPU-based code.)
>
> Benchmark              (lastNEQ)   (n)  Mode  Cnt     Score    Error  Units
> # Baseline
> # VM options: -XX:TieredStopAtLevel=1
> ByteArray.base_equals      false  1024  avgt   10  1190.177 ± 21.387  ns/op
> ByteArray.base_equals       true  1024  avgt   10  1191.767 ± 35.196  ns/op
>
> # Before patch
> # VM options: -XX:TieredStopAtLevel=1 -XX:-SpecialArraysEquals -XX:-UseVectorizedMismatchIntrinsic
> ByteArray.jdk_equals       false  1024  avgt   10   208.014 ±  5.224  ns/op
> ByteArray.jdk_equals        true  1024  avgt   10   218.271 ± 10.749  ns/op
>
> # After patch
> # VM options: -XX:TieredStopAtLevel=1 -XX:-SpecialArraysEquals -XX:+UseVectorizedMismatchIntrinsic
> ByteArray.jdk_equals       false  1024  avgt   10    70.097 ±  2.321  ns/op
> ByteArray.jdk_equals        true  1024  avgt   10    72.284 ±  1.578  ns/op
>
>
>
>> thanks,
>> Coleen
>>
>>
>> On 4/14/16 10:53 AM, Paul Sandoz wrote:
>>> Hi,
>>>
>>> I hooked up the array mismatch stub to the interpreter, with a bit of code cargo culting the CRC work and some lldb debugging [*] it appears to work and pass tests.
>>>
>>> Can someone have a quick look to see if i am not the right track here:
>>>
>>>    http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/ <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/>
>>>
>>>
>>> Here are some quick numbers running using -Xint for byte[] equality:
>>>
>>> Benchmark              (lastNEQ)   (n)  Mode  Cnt       Score      Error  Units
>>> # Baseline
>>> # VM options: -Xint
>>> ByteArray.base_equals      false  1024  avgt   10  16622.453 ± 498.475  ns/op
>>> ByteArray.base_equals       true  1024  avgt   10  16889.244 ± 439.895  ns/op
>>>
>>> # Before patch
>>> # VM options: -Xint -XX:-UseVectorizedMismatchIntrinsic
>>> ByteArray.jdk_equals       false  1024  avgt   10  106436.195 ± 3657.508  ns/op
>>> ByteArray.jdk_equals        true  1024  avgt   10  103306.001 ± 2723.130  ns/op
>>>
>>> # After patch
>>> # VM options: -Xint -XX:+UseVectorizedMismatchIntrinsic
>>> ByteArray.jdk_equals       false  1024  avgt   10    448.764 ±  18.977  ns/op
>>> ByteArray.jdk_equals        true  1024  avgt   10    448.657 ±  22.656  ns/op
>>>
>>>
>>>
>>> The next step is to wire up C1.
>>>
>>> Further steps would be to substitute some of intrinsics added/used for compact strings with mismatch, then evaluate the performance.
>>>
>>> Thanks,
>>> Paul.
>>>
>>> [*] Stubs to be used as intrinsics in the template interpreter need to be created during the initial stage of generation, otherwise the stub address is null which leads to a SEGV that’s hard to track down.



More information about the hotspot-dev mailing list