Weird performance behavior involving VarHandles
Andrew Haley
aph-open at littlepinkcloud.com
Fri Apr 26 12:57:55 UTC 2024
On 4/24/24 23:28, Maurizio Cimadamore wrote:
> I seem to recall that the lambda forms for guards-with-test are rather complex, as they need to profile the various branches. I wonder if some "leftover" from the profiling code stays there and pollutes the benchmark?
It's definitely different inlining. On AArch64 I see
ReproducerBenchmarks.control avgt 5 1.438 ± 0.005 ns/op
ReproducerBenchmarks.gwt2_methodhandle avgt 5 2.112 ± 0.076 ns/op
ReproducerBenchmarks.gwt_methodhandle avgt 5 1.440 ± 0.074 ns/op
and the important difference is here, see the "dmb ish" that is pinned:
│ 0x0000fffefcb54a70: tbnz w14, #0x1f, #0xfffefcb54cd8
│ ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002)
│ ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107)
│ ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh
│ ;; B29: # out( B42 B30 ) <- in( B28 ) Freq: 91235.6
│ 0x0000fffefcb54a74: ldr w11, [x11] ;*invokevirtual getIntUnaligned {reexecute=0 rethrow=0 return_oop=0}
│ ; - jdk.internal.misc.Unsafe::getIntUnaligned at 5 (line 3576)
│ ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 15 (line 1893)
│ ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881)
│ ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108)
│ ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14
│ ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53
│ ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002)
│ ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 13 (line 106)
│ ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh
│ ;; membar_release
│ 0x0000fffefcb54a78: dmb ish ;*synchronization entry
│ ; - java.lang.invoke.VarHandle::getMethodHandle at -1 (line 2203)
│ ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 59 (line 1001)
│ ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107)
│ ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh
│ 0x0000fffefcb54a7c: ldr w14, [x13, #0x18] ;*getfield scope {reexecute=0 rethrow=0 return_oop=0}
│ ; - jdk.internal.foreign.AbstractMemorySegmentImpl::sessionImpl at 1 (line 430)
│ ; - java.lang.invoke.VarHandleSegmentAsInts::get at 24 (line 108)
│ ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14
│ ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53
│ ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002)
│ ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107)
│ ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh
This is a release fence. It could be from a constructor with a final field.
I think it's this:
MethodHandle getMethodHandle(int mode) {
MethodHandle[] mhTable = methodHandleTable;
if (mhTable == null) {
mhTable = methodHandleTable = new MethodHandle[AccessMode.COUNT];
}
MethodHandle mh = mhTable[mode];
if (mh == null) {
mh = mhTable[mode] = getMethodHandleUncached(mode);
}
return mh;
If I had to guess, it's that a constructor here is being scalar replaced, but its fence
is remaining, and it prevents code motion, so the fields scope and min are being reloaded
rather than hoisted. Even though a release barrier doesn't generate any code on x86 because
x86 is TSO, it will still prevent code motion.
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the hotspot-compiler-dev
mailing list