[foreign-memaccess+abi] RFR: 8275646: Implement optimized upcall stubs on AArch64
Nick Gasson
ngasson at openjdk.java.net
Mon Nov 8 09:23:04 UTC 2021
On Fri, 5 Nov 2021 10:34:31 GMT, Andrew Haley <aph at openjdk.org> wrote:
> I'm interested to know where all that 250ns is going. Did you look at -prof perfasm?
I should have tested on a more modern machine. Here's the results from an N1 server:
Before:
Benchmark Mode Cnt Score Error Units
Upcalls.jni_args10 avgt 30 168.016 ± 0.571 ns/op
Upcalls.jni_args5 avgt 30 98.907 ± 0.833 ns/op
Upcalls.jni_blank avgt 30 78.007 ± 0.138 ns/op
Upcalls.jni_identity avgt 30 150.933 ± 0.854 ns/op
Upcalls.panama_args10 avgt 30 600.304 ± 35.944 ns/op
Upcalls.panama_args5 avgt 30 447.607 ± 24.090 ns/op
Upcalls.panama_blank avgt 30 238.953 ± 12.512 ns/op
Upcalls.panama_identity avgt 30 314.388 ± 31.383 ns/op
After:
Benchmark Mode Cnt Score Error Units
Upcalls.jni_args10 avgt 30 168.528 ± 0.658 ns/op
Upcalls.jni_args5 avgt 30 98.595 ± 0.628 ns/op
Upcalls.jni_blank avgt 30 78.420 ± 0.376 ns/op
Upcalls.jni_identity avgt 30 154.403 ± 2.090 ns/op
Upcalls.panama_args10 avgt 30 86.066 ± 4.202 ns/op
Upcalls.panama_args5 avgt 30 78.094 ± 3.718 ns/op
Upcalls.panama_blank avgt 30 68.683 ± 2.107 ns/op
Upcalls.panama_identity avgt 30 76.841 ± 11.340 ns/op
Which is closer to the x86 results. The top two functions are the `on_entry` and `on_exit` VM calls:
20.44% libjvm.so ProgrammableUpcallHandler::on_entry (228 bytes)
12.30% libjvm.so ProgrammableUpcallHandler::on_exit (88 bytes)
9.20% runtime stub StubRoutines::atomic entry points (148 bytes)
8.65% c2, level 4 org.openjdk.bench.jdk.incubator.foreign.jmh_generated.Upcalls_panama_args5_jmhTest::panama_args5_avgt_jmhStub, version 1386 (100 bytes)
4.64% libjvm.so ThreadShadow::clear_pending_exception (32 bytes)
4.07% c2, level 4 java.lang.invoke.LambdaForm$MH.0x0000000800d12c00::invoke, version 1369 (212 bytes)
3.94% c2, level 4 java.lang.invoke.LambdaForm$MH.0x0000000800d12c00::invoke, version 1369 (268 bytes)
> That code was taken from JavaCallWrapper code. Do you happen to know which use-case those calls were supposed to address? (I'm assuming things still work on MacOS/AArch64 without them).
I think the flow for JavaCallWrapper is slightly different. It's doing:
Native -> JavaCallWrapper -> Java (need X) -> ~JavaCallWrapper -> Native (need W?)
Whereas here we're doing:
Native -> Stub (need X) -> on_entry -> Java (need X) -> on_exit -> Stub (need X) -> Native
I'm not sure why ~JavaCallWrapper needs to set the W mode though.
-------------
PR: https://git.openjdk.java.net/panama-foreign/pull/610
More information about the panama-dev
mailing list