RFR: 8325937: runtime/handshake/HandshakeDirectTest.java causes "monitor end should be strictly below the frame pointer" assertion failure on AArch64
Erik Österlund
eosterlund at openjdk.org
Wed Oct 2 04:38:34 UTC 2024
On Wed, 2 Oct 2024 03:16:56 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:
>> Add missing StoreLoad fences in handshaking code to match safepoint code. Thanks to @pchilano for finding this bug.
>>
>> Tested with tier1-4 and tier8 which has Kitchensink in it.
>
> I stepped through the code in one of the Neoverse-N1 machines we have and I don't see a full fence when grabbing the mutex, just acquire semantics through the casa instruction (I skipped the first instructions):
>
>
> 0xfffff7eaa330 <pthread_mutex_trylock+80>: mov x2, x20
> 0xfffff7eaa334 <pthread_mutex_trylock+84>: mov w1, #0x1 // #1
> 0xfffff7eaa338 <pthread_mutex_trylock+88>: mov w0, #0x0 // #0
> 0xfffff7eaa33c <pthread_mutex_trylock+92>: mov w19, #0x0 // #0
> 0xfffff7eaa340 <pthread_mutex_trylock+96>: bl 0xfffff7eb4b80 <__aarch64_cas4_acq>
> 0xfffff7eaa344 <pthread_mutex_trylock+100>: cbnz w0, 0xfffff7eaa730 <pthread_mutex_trylock+1104>
> 0xfffff7eaa348 <pthread_mutex_trylock+104>: mov w0, w19
> 0xfffff7eaa34c <pthread_mutex_trylock+108>: ldp x19, x20, [sp, #16]
> 0xfffff7eaa350 <pthread_mutex_trylock+112>: ldp x21, x22, [sp, #32]
> 0xfffff7eaa354 <pthread_mutex_trylock+116>: ldp x23, x24, [sp, #48]
> 0xfffff7eaa358 <pthread_mutex_trylock+120>: ldp x29, x30, [sp], #96
> 0xfffff7eaa35c <pthread_mutex_trylock+124>: ret
>
> 0xfffff7eb4b80 <__aarch64_cas4_acq>: adrp x16, 0xfffff7ed4000 <__pthread_keys+15608>
> 0xfffff7eb4b84 <__aarch64_cas4_acq+4>: ldrb w16, [x16, #920]
> 0xfffff7eb4b88 <__aarch64_cas4_acq+8>: cbz w16, 0xfffff7eb4b94 <__aarch64_cas4_acq+20>
> 0xfffff7eb4b8c <__aarch64_cas4_acq+12>: casa w0, w1, [x2]
> 0xfffff7eb4b90 <__aarch64_cas4_acq+16>: ret
>
>
> We could double-check the same thing in the Neoverse-N2 machine where we are seeing this issue but I think this already shows that we can't expect a full fence. In general that's how I always thought about pthread_mutex_lock/unlock as providing acquire/release semantics and not full fence [1], although in practice it probably would.
>
> As to why we are not seeing the crash on the N1 then, my guess is the casa instruction might be implemented with stronger guarantees than what is specified, but not in the N2.
>
> [1] https://preshing.com/20120913/acquire-and-release-semantics/
Thanks for checking @pchilano. As @dholmes-ora mentioned, POSIX doesn't really specify in more detail how strong memory ordering you can assume. So relaxing it such that accesses in the critical section may float above it, is an "interesting choice". I have seen a whole bunch of code that assumes that doesn't happen.
I suppose we have the choice of adding a fence to lock() and friends on ARM, or discovering through interesting core files where else we expected accesses in critical sections to not float out from the critical section as if locking in the JVM is some kind of slippery water park.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21295#issuecomment-2387609074
More information about the hotspot-runtime-dev
mailing list