AArch64: loading the thread-local poll value when waking up from native with ldAr in case -XX:+UseSystemMemoryBarrier

Dmitry Chuyko dmitry.chuyko at bell-sw.com
Thu Jul 25 13:43:40 UTC 2024


Hello.

We've recently been looking into some JNI calls overhead on linux-aarch64,
specifically in SharedRuntime::generate_native_wrapper(). It turned out
that although the DMB ISH is not issued when UseSystemMemoryBarrier is on,
the subsequent safepoint_poll is still made with acquire. It would be very
beneficial (-25-75% JNI call overhead if there is no negative USMB impact)
to avoid acquire there in case UseSystemMemoryBarrier is on, at least on
some machines.

Looking at the history, there are few key points for this code. First, when
Andrew Haley created the aarch64 implementation of Thread-local handshakes
[0], Erik Österlund pointed to LDAR [1] and fence [3] necessity, and also
described the logic behind that part at that time [2]. Robbin Ehn later
added back barrier-less Java thread transitions [4], and the review
mentions that store_load_barrier before load poll can be replaced with the
compiler_barrier [5].

I wonder if it would be correct to emit the safepoint_poll with
acquire=false in case UseSystemMemoryBarrier is true at least on some known
CPUs. Regression tests don't reveal problems, but they don't produce eager
safepoint/jni races.

-Dmitry

[0] https://bugs.openjdk.org/browse/JDK-8189596
[1] https://mail.openjdk.org/pipermail/hotspot-dev/2017-November/029264.html
[2] https://mail.openjdk.org/pipermail/hotspot-dev/2017-November/029269.html
[3] https://mail.openjdk.org/pipermail/hotspot-dev/2017-November/029277.html
[4] https://bugs.openjdk.org/browse/JDK-8292591
[5] https://github.com/openjdk/jdk/pull/10123#issuecomment-1235123012
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-compiler-dev/attachments/20240725/734cfa86/attachment.htm>


More information about the hotspot-compiler-dev mailing list