RFR: 8325937: runtime/handshake/HandshakeDirectTest.java causes "monitor end should be strictly below the frame pointer" assertion failure on AArch64
Patricio Chilano Mateo
pchilanomate at openjdk.org
Wed Oct 2 03:19:33 UTC 2024
On Tue, 1 Oct 2024 18:38:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:
> Add missing StoreLoad fences in handshaking code to match safepoint code. Thanks to @pchilano for finding this bug.
>
> Tested with tier1-4 and tier8 which has Kitchensink in it.
I stepped through the code in one of the Neoverse-N1 machines we have and I don't see a full fence when grabbing the mutex, just acquire semantics through the casa instruction (I skipped the first instructions):
0xfffff7eaa330 <pthread_mutex_trylock+80>: mov x2, x20
0xfffff7eaa334 <pthread_mutex_trylock+84>: mov w1, #0x1 // #1
0xfffff7eaa338 <pthread_mutex_trylock+88>: mov w0, #0x0 // #0
0xfffff7eaa33c <pthread_mutex_trylock+92>: mov w19, #0x0 // #0
0xfffff7eaa340 <pthread_mutex_trylock+96>: bl 0xfffff7eb4b80 <__aarch64_cas4_acq>
0xfffff7eaa344 <pthread_mutex_trylock+100>: cbnz w0, 0xfffff7eaa730 <pthread_mutex_trylock+1104>
0xfffff7eaa348 <pthread_mutex_trylock+104>: mov w0, w19
0xfffff7eaa34c <pthread_mutex_trylock+108>: ldp x19, x20, [sp, #16]
0xfffff7eaa350 <pthread_mutex_trylock+112>: ldp x21, x22, [sp, #32]
0xfffff7eaa354 <pthread_mutex_trylock+116>: ldp x23, x24, [sp, #48]
0xfffff7eaa358 <pthread_mutex_trylock+120>: ldp x29, x30, [sp], #96
0xfffff7eaa35c <pthread_mutex_trylock+124>: ret
0xfffff7eb4b80 <__aarch64_cas4_acq>: adrp x16, 0xfffff7ed4000 <__pthread_keys+15608>
0xfffff7eb4b84 <__aarch64_cas4_acq+4>: ldrb w16, [x16, #920]
0xfffff7eb4b88 <__aarch64_cas4_acq+8>: cbz w16, 0xfffff7eb4b94 <__aarch64_cas4_acq+20>
0xfffff7eb4b8c <__aarch64_cas4_acq+12>: casa w0, w1, [x2]
0xfffff7eb4b90 <__aarch64_cas4_acq+16>: ret
We could double-check the same thing in the Neoverse-N2 machine where we are seeing this issue but I think this already shows that we can't expect a full fence. In general that's how I always thought about pthread_mutex_lock/unlock as providing acquire/release semantics and not full fence [1], although in practice it probably would.
As to why we are not seeing the crash on the N1 then, my guess is the casa instruction might be implemented with stronger guarantees than what is specified, but not in the N2.
[1] https://preshing.com/20120913/acquire-and-release-semantics/
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21295#issuecomment-2387549699
More information about the hotspot-runtime-dev
mailing list