Hooking up the array mismatch stub as an intrinsic in the template interpreter
Paul Sandoz
paul.sandoz at oracle.com
Fri Apr 15 13:07:08 UTC 2016
> On 15 Apr 2016, at 14:12, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
>
>
> I don't know why we'd add even more assembly code to the interpreter. Why doesn't the JIT optimize this function instead? By adding a stub in the interpreter does that prevent the JIT from inlining this function since it's not invocation counted?
>
I have updated the webrev with C1 support [1] and determined, eyeballing generated code, that the stub call gets inlined for C1 and C2 and appears unaffected by the wiring up of that same stub in the template interpreter.
A stub was added and wired up to C2 with the intention to wire that up to C1, and possible to the interpreter. One reason for the latter was because of the performance results presented in the last email (potentially ~200x over the current approach, and ~35x improvement over the original Java code). Does that matter? would you be concerned about that?
Array equality is quite a fundamental operation so i was concerned about such a regression in the interpreter.
Another reason for the latter, which i may be off base on here, is it might make it easier to consolidate the intrinsics added for compact string equality/comparison to this more general mismatch functionality.
—
Regarding the changes to C1 in [1]. Like for the CRC intrinsics i added the _vectorizedMismatch intrinsic to the set of intrinsics that preserve state and can trap. Is that correct? Also i am not sure if the 32-bit part is correct.
Thanks,
Paul.
[1] http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/
(Note: this is still incomplete i need to appropriately update all CPU-based code.)
Benchmark (lastNEQ) (n) Mode Cnt Score Error Units
# Baseline
# VM options: -XX:TieredStopAtLevel=1
ByteArray.base_equals false 1024 avgt 10 1190.177 ± 21.387 ns/op
ByteArray.base_equals true 1024 avgt 10 1191.767 ± 35.196 ns/op
# Before patch
# VM options: -XX:TieredStopAtLevel=1 -XX:-SpecialArraysEquals -XX:-UseVectorizedMismatchIntrinsic
ByteArray.jdk_equals false 1024 avgt 10 208.014 ± 5.224 ns/op
ByteArray.jdk_equals true 1024 avgt 10 218.271 ± 10.749 ns/op
# After patch
# VM options: -XX:TieredStopAtLevel=1 -XX:-SpecialArraysEquals -XX:+UseVectorizedMismatchIntrinsic
ByteArray.jdk_equals false 1024 avgt 10 70.097 ± 2.321 ns/op
ByteArray.jdk_equals true 1024 avgt 10 72.284 ± 1.578 ns/op
> thanks,
> Coleen
>
>
> On 4/14/16 10:53 AM, Paul Sandoz wrote:
>> Hi,
>>
>> I hooked up the array mismatch stub to the interpreter, with a bit of code cargo culting the CRC work and some lldb debugging [*] it appears to work and pass tests.
>>
>> Can someone have a quick look to see if i am not the right track here:
>>
>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/ <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8151268-int-c1-mismatch/webrev/>
>>
>>
>> Here are some quick numbers running using -Xint for byte[] equality:
>>
>> Benchmark (lastNEQ) (n) Mode Cnt Score Error Units
>> # Baseline
>> # VM options: -Xint
>> ByteArray.base_equals false 1024 avgt 10 16622.453 ± 498.475 ns/op
>> ByteArray.base_equals true 1024 avgt 10 16889.244 ± 439.895 ns/op
>>
>> # Before patch
>> # VM options: -Xint -XX:-UseVectorizedMismatchIntrinsic
>> ByteArray.jdk_equals false 1024 avgt 10 106436.195 ± 3657.508 ns/op
>> ByteArray.jdk_equals true 1024 avgt 10 103306.001 ± 2723.130 ns/op
>>
>> # After patch
>> # VM options: -Xint -XX:+UseVectorizedMismatchIntrinsic
>> ByteArray.jdk_equals false 1024 avgt 10 448.764 ± 18.977 ns/op
>> ByteArray.jdk_equals true 1024 avgt 10 448.657 ± 22.656 ns/op
>>
>>
>>
>> The next step is to wire up C1.
>>
>> Further steps would be to substitute some of intrinsics added/used for compact strings with mismatch, then evaluate the performance.
>>
>> Thanks,
>> Paul.
>>
>> [*] Stubs to be used as intrinsics in the template interpreter need to be created during the initial stage of generation, otherwise the stub address is null which leads to a SEGV that’s hard to track down.
>
More information about the hotspot-dev
mailing list