[foreign-memaccess+abi] RFR: 8275646: Implement optimized upcall stubs on AArch64
Andrew Haley
aph at openjdk.java.net
Fri Nov 5 10:37:27 UTC 2021
On Fri, 5 Nov 2021 06:47:19 GMT, Nick Gasson <ngasson at openjdk.org> wrote:
> This is a fairly direct port of the X86 code. The changes to
> frame_aarch64 and foreign_globals_aarch64 are identical: perhaps
> ForeignGlobals::parse_call_regs_impl() could be moved to shared code?
>
> The X86 version has a call to reinit_heapbase() before the return from
> the stub. I think this is redundant because the heap base register will
> be clobbered immediately by the native code?
>
> I hit a really weird bug in release builds where the first few
> instructions in the code buffer were overwritten by the fields of
> OptimizedEntryBlob. I think we need to pass sizeof(OptimizedEntryBlob)
> instead of sizeof(BufferBlob) as the header_size argument to the
> RuntimeBlob constructor (none of the other subclasses of BufferBlob have
> extra fields). I added a header_size argument to BufferBlob's
> constructor to thread this through.
>
> I removed the calls to change the W^X state in on_entry/on_exit calls:
> in the on_entry case the stub must already be executable because we
> called into the VM from there, and for on_exit we need the code to be
> executable not writable otherwise we'll get a SIGBUS as soon as we
> return to the stub. The newly added stack tests in TestUpcall hit
> JDK-8275584 on MacOS/AArch64 so I've problem-listed that for now.
>
> JMH results from org.openjdk.bench.jdk.incubator.foreign.Upcalls before:
>
>
> Benchmark Mode Cnt Score Error Units
> Upcalls.jni_args10 avgt 30 450.417 ? 4.755 ns/op
> Upcalls.jni_args5 avgt 30 245.898 ? 3.171 ns/op
> Upcalls.jni_blank avgt 30 195.606 ? 5.459 ns/op
> Upcalls.jni_identity avgt 30 369.788 ? 15.165 ns/op
> Upcalls.panama_args10 avgt 30 1253.189 ? 62.261 ns/op
> Upcalls.panama_args5 avgt 30 927.101 ? 35.369 ns/op
> Upcalls.panama_blank avgt 30 637.708 ? 11.353 ns/op
> Upcalls.panama_identity avgt 30 697.109 ? 9.971 ns/op
>
>
> After:
>
>
> Benchmark Mode Cnt Score Error Units
> Upcalls.jni_args10 avgt 30 455.304 ? 21.838 ns/op
> Upcalls.jni_args5 avgt 30 247.279 ? 2.513 ns/op
> Upcalls.jni_blank avgt 30 194.113 ? 4.317 ns/op
> Upcalls.jni_identity avgt 30 366.145 ? 4.912 ns/op
> Upcalls.panama_args10 avgt 30 236.337 ? 11.072 ns/op
> Upcalls.panama_args5 avgt 30 223.858 ? 12.345 ns/op
> Upcalls.panama_blank avgt 30 203.631 ? 8.840 ns/op
> Upcalls.panama_identity avgt 30 208.783 ? 9.914 ns/op
>
>
> Tested tier1 and jdk_foreign on Linux/AArch64 and MacOS/AArch64.
The AArch64 code parts look reasonable enough, but parts of them seem to repeat some logic already present in MacroAssembler. I guess as it's desirable to stay close to the x86 port that's OK.
I'm interested to know where all that 250ns is going. Did you look at -prof perfasm?
-------------
PR: https://git.openjdk.java.net/panama-foreign/pull/610
More information about the panama-dev
mailing list