[foreign-memaccess+abi] RFR: 8275646: Implement optimized upcall stubs on AArch64
Jorn Vernee
jvernee at openjdk.java.net
Fri Nov 5 14:17:20 UTC 2021
On Fri, 5 Nov 2021 06:47:19 GMT, Nick Gasson <ngasson at openjdk.org> wrote:
> I removed the calls to change the W^X state in on_entry/on_exit calls:
> in the on_entry case the stub must already be executable because we
> called into the VM from there, and for on_exit we need the code to be
> executable not writable otherwise we'll get a SIGBUS as soon as we
> return to the stub.
Thanks for cleaning this up as well. That code was taken from `JavaCallWrapper` code. Do you happen to know which use-case those calls were supposed to address? (I'm assuming things still work on MacOS/AArch64 without them).
---
W.R.T. to JDK-8275584, I was wondering how JNI handles this, but it looks like it doesn't:
#ifdef __APPLE__
// Less-than word types are stored one after another.
// The code is unable to handle this so bailout.
return -1;
#endif
(in SharedRuntime_aarch64).
Maybe for now we should add a similar early bailout in Java code on MacOS/AArch64, until we have a chance to implement this.
---
WRT to performance, I see somewhat comparable numbers (but on a different scale), on my x64 machine:
Benchmark Mode Cnt Score Error Units
Upcalls.jni_args10 avgt 30 148.993 � 1.492 ns/op
Upcalls.jni_args5 avgt 30 79.690 � 0.753 ns/op
Upcalls.jni_blank avgt 30 55.017 � 1.067 ns/op
Upcalls.jni_identity avgt 30 120.529 � 1.265 ns/op
Upcalls.panama_args10 avgt 30 55.065 � 1.037 ns/op
Upcalls.panama_args5 avgt 30 53.792 � 2.640 ns/op
Upcalls.panama_blank avgt 30 47.916 � 1.764 ns/op
Upcalls.panama_identity avgt 30 49.680 � 1.604 ns/op
---
> ForeignGlobals::parse_call_regs_impl() could be moved to shared code?
Ok, feel free to move it to shared code if you want.
src/hotspot/cpu/aarch64/foreign_globals_aarch64.cpp line 119:
> 117: jint index = storage->int_field(VMS.index_offset);
> 118: jint type = storage->int_field(VMS.type_offset);
> 119: result._ret_regs[i] = vmstorage_to_vmreg(type, index);
On x86, I switched these out to call `parse_vmstorage` which is a little simpler.
src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 217:
> 215: // skip receiver
> 216: BasicType* in_sig_bt = out_sig_bt + 1;
> 217: int total_in_args = total_out_args - 1;
This section could also be moved to shared code as well (I did that in the pending patch as well: https://github.com/openjdk/panama-foreign/pull/608/files#r741389756 but it would be fine to do it here instead of you want)
src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 395:
> 393: blob->print_on(tty);
> 394: Disassembler::decode(blob, tty);
> 395: }
I think I cleaned this up for x64 in the pending patch as well: `blob->print_on` already prints the disassembly, so the second call to `decode` is not needed (and you can remove the include for disassembler as well).
src/hotspot/share/code/codeBlob.cpp line 253:
> 251:
> 252: BufferBlob::BufferBlob(const char* name, int header_size, int size, CodeBuffer* cb)
> 253: : RuntimeBlob(name, cb, header_size, size, CodeOffsets::frame_never_safe, 0, NULL)
Thanks for fixing this. Not sure why I never saw any problems on x64 (maybe some padding somewhere saved us).
-------------
Marked as reviewed by jvernee (Committer).
PR: https://git.openjdk.java.net/panama-foreign/pull/610
More information about the panama-dev
mailing list