[foreign-memaccess+abi] RFR: 8275646: Implement optimized upcall stubs on AArch64

Nick Gasson ngasson at openjdk.java.net
Mon Nov 8 09:23:04 UTC 2021


On Fri, 5 Nov 2021 10:34:31 GMT, Andrew Haley <aph at openjdk.org> wrote:

> I'm interested to know where all that 250ns is going. Did you look at -prof perfasm?

I should have tested on a more modern machine. Here's the results from an N1 server:

Before:


Benchmark                Mode  Cnt    Score    Error  Units
Upcalls.jni_args10       avgt   30  168.016 ±  0.571  ns/op
Upcalls.jni_args5        avgt   30   98.907 ±  0.833  ns/op
Upcalls.jni_blank        avgt   30   78.007 ±  0.138  ns/op
Upcalls.jni_identity     avgt   30  150.933 ±  0.854  ns/op
Upcalls.panama_args10    avgt   30  600.304 ± 35.944  ns/op
Upcalls.panama_args5     avgt   30  447.607 ± 24.090  ns/op
Upcalls.panama_blank     avgt   30  238.953 ± 12.512  ns/op
Upcalls.panama_identity  avgt   30  314.388 ± 31.383  ns/op


After:


Benchmark                Mode  Cnt    Score    Error  Units
Upcalls.jni_args10       avgt   30  168.528 ±  0.658  ns/op
Upcalls.jni_args5        avgt   30   98.595 ±  0.628  ns/op
Upcalls.jni_blank        avgt   30   78.420 ±  0.376  ns/op
Upcalls.jni_identity     avgt   30  154.403 ±  2.090  ns/op
Upcalls.panama_args10    avgt   30   86.066 ±  4.202  ns/op
Upcalls.panama_args5     avgt   30   78.094 ±  3.718  ns/op
Upcalls.panama_blank     avgt   30   68.683 ±  2.107  ns/op
Upcalls.panama_identity  avgt   30   76.841 ± 11.340  ns/op


Which is closer to the x86 results. The top two functions are the `on_entry` and `on_exit` VM calls:


 20.44%           libjvm.so  ProgrammableUpcallHandler::on_entry (228 bytes)
 12.30%           libjvm.so  ProgrammableUpcallHandler::on_exit (88 bytes)
  9.20%        runtime stub  StubRoutines::atomic entry points (148 bytes)
  8.65%         c2, level 4  org.openjdk.bench.jdk.incubator.foreign.jmh_generated.Upcalls_panama_args5_jmhTest::panama_args5_avgt_jmhStub, version 1386 (100 bytes)
  4.64%           libjvm.so  ThreadShadow::clear_pending_exception (32 bytes)
  4.07%         c2, level 4  java.lang.invoke.LambdaForm$MH.0x0000000800d12c00::invoke, version 1369 (212 bytes)
  3.94%         c2, level 4  java.lang.invoke.LambdaForm$MH.0x0000000800d12c00::invoke, version 1369 (268 bytes)


> That code was taken from JavaCallWrapper code. Do you happen to know which use-case those calls were supposed to address? (I'm assuming things still work on MacOS/AArch64 without them).

I think the flow for JavaCallWrapper is slightly different. It's doing:

Native -> JavaCallWrapper -> Java (need X) -> ~JavaCallWrapper -> Native (need W?)

Whereas here we're doing:

Native -> Stub (need X) -> on_entry -> Java (need X) -> on_exit -> Stub (need X) -> Native

I'm not sure why ~JavaCallWrapper needs to set the W mode though.

-------------

PR: https://git.openjdk.java.net/panama-foreign/pull/610


More information about the panama-dev mailing list