[foreign-memaccess+abi] RFR: 8275646: Implement optimized upcall stubs on AArch64

Jorn Vernee jvernee at openjdk.java.net
Fri Nov 5 14:17:20 UTC 2021


On Fri, 5 Nov 2021 06:47:19 GMT, Nick Gasson <ngasson at openjdk.org> wrote:

> I removed the calls to change the W^X state in on_entry/on_exit calls:
> in the on_entry case the stub must already be executable because we
> called into the VM from there, and for on_exit we need the code to be
> executable not writable otherwise we'll get a SIGBUS as soon as we
> return to the stub.

Thanks for cleaning this up as well. That code was taken from `JavaCallWrapper` code. Do you happen to know which use-case those calls were supposed to address? (I'm assuming things still work on MacOS/AArch64 without them).

---

W.R.T. to JDK-8275584, I was wondering how JNI handles this, but it looks like it doesn't:


#ifdef __APPLE__
          // Less-than word types are stored one after another.
          // The code is unable to handle this so bailout.
          return -1;
#endif


(in SharedRuntime_aarch64).

Maybe for now we should add a similar early bailout in Java code on MacOS/AArch64, until we have a chance to implement this.

---

WRT to performance, I see somewhat comparable numbers (but on a different scale), on my x64 machine:


Benchmark                Mode  Cnt    Score   Error  Units
Upcalls.jni_args10       avgt   30  148.993 � 1.492  ns/op
Upcalls.jni_args5        avgt   30   79.690 � 0.753  ns/op
Upcalls.jni_blank        avgt   30   55.017 � 1.067  ns/op
Upcalls.jni_identity     avgt   30  120.529 � 1.265  ns/op
Upcalls.panama_args10    avgt   30   55.065 � 1.037  ns/op
Upcalls.panama_args5     avgt   30   53.792 � 2.640  ns/op
Upcalls.panama_blank     avgt   30   47.916 � 1.764  ns/op
Upcalls.panama_identity  avgt   30   49.680 � 1.604  ns/op


---

> ForeignGlobals::parse_call_regs_impl() could be moved to shared code?

Ok, feel free to move it to shared code if you want.

src/hotspot/cpu/aarch64/foreign_globals_aarch64.cpp line 119:

> 117:     jint index = storage->int_field(VMS.index_offset);
> 118:     jint type = storage->int_field(VMS.type_offset);
> 119:     result._ret_regs[i] = vmstorage_to_vmreg(type, index);

On x86, I switched these out to call `parse_vmstorage` which is a little simpler.

src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 217:

> 215:   // skip receiver
> 216:   BasicType* in_sig_bt = out_sig_bt + 1;
> 217:   int total_in_args = total_out_args - 1;

This section could also be moved to shared code as well (I did that in the pending patch as well: https://github.com/openjdk/panama-foreign/pull/608/files#r741389756 but it would be fine to do it here instead of you want)

src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 395:

> 393:     blob->print_on(tty);
> 394:     Disassembler::decode(blob, tty);
> 395:   }

I think I cleaned this up for x64 in the pending patch as well: `blob->print_on` already prints the disassembly, so the second call to `decode` is not needed (and you can remove the include for disassembler as well).

src/hotspot/share/code/codeBlob.cpp line 253:

> 251: 
> 252: BufferBlob::BufferBlob(const char* name, int header_size, int size, CodeBuffer* cb)
> 253:   : RuntimeBlob(name, cb, header_size, size, CodeOffsets::frame_never_safe, 0, NULL)

Thanks for fixing this. Not sure why I never saw any problems on x64 (maybe some padding somewhere saved us).

-------------

Marked as reviewed by jvernee (Committer).

PR: https://git.openjdk.java.net/panama-foreign/pull/610


More information about the panama-dev mailing list