RFR: 8325937: runtime/handshake/HandshakeDirectTest.java causes "monitor end should be strictly below the frame pointer" assertion failure on AArch64
David Holmes
dholmes at openjdk.org
Wed Oct 2 09:44:36 UTC 2024
On Wed, 2 Oct 2024 04:35:43 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:
>> I stepped through the code in one of the Neoverse-N1 machines we have and I don't see a full fence when grabbing the mutex, just acquire semantics through the casa instruction (I skipped the first instructions):
>>
>>
>> 0xfffff7eaa330 <pthread_mutex_trylock+80>: mov x2, x20
>> 0xfffff7eaa334 <pthread_mutex_trylock+84>: mov w1, #0x1 // #1
>> 0xfffff7eaa338 <pthread_mutex_trylock+88>: mov w0, #0x0 // #0
>> 0xfffff7eaa33c <pthread_mutex_trylock+92>: mov w19, #0x0 // #0
>> 0xfffff7eaa340 <pthread_mutex_trylock+96>: bl 0xfffff7eb4b80 <__aarch64_cas4_acq>
>> 0xfffff7eaa344 <pthread_mutex_trylock+100>: cbnz w0, 0xfffff7eaa730 <pthread_mutex_trylock+1104>
>> 0xfffff7eaa348 <pthread_mutex_trylock+104>: mov w0, w19
>> 0xfffff7eaa34c <pthread_mutex_trylock+108>: ldp x19, x20, [sp, #16]
>> 0xfffff7eaa350 <pthread_mutex_trylock+112>: ldp x21, x22, [sp, #32]
>> 0xfffff7eaa354 <pthread_mutex_trylock+116>: ldp x23, x24, [sp, #48]
>> 0xfffff7eaa358 <pthread_mutex_trylock+120>: ldp x29, x30, [sp], #96
>> 0xfffff7eaa35c <pthread_mutex_trylock+124>: ret
>>
>> 0xfffff7eb4b80 <__aarch64_cas4_acq>: adrp x16, 0xfffff7ed4000 <__pthread_keys+15608>
>> 0xfffff7eb4b84 <__aarch64_cas4_acq+4>: ldrb w16, [x16, #920]
>> 0xfffff7eb4b88 <__aarch64_cas4_acq+8>: cbz w16, 0xfffff7eb4b94 <__aarch64_cas4_acq+20>
>> 0xfffff7eb4b8c <__aarch64_cas4_acq+12>: casa w0, w1, [x2]
>> 0xfffff7eb4b90 <__aarch64_cas4_acq+16>: ret
>>
>>
>> We could double-check the same thing in the Neoverse-N2 machine where we are seeing this issue but I think this already shows that we can't expect a full fence. In general that's how I always thought about pthread_mutex_lock/unlock as providing acquire/release semantics and not full fence [1], although in practice it probably would.
>>
>> As to why we are not seeing the crash on the N1 then, my guess is the casa instruction might be implemented with stronger guarantees than what is specified, but not in the N2.
>>
>> [1] https://preshing.com/20120913/acquire-and-release-semantics/
>
> Thanks for checking @pchilano. As @dholmes-ora mentioned, POSIX doesn't really specify in more detail how strong memory ordering you can assume. So relaxing it such that accesses in the critical section may float above it, is an "interesting choice". I have seen a whole bunch of code that assumes that doesn't happen.
>
> I suppose we have the choice of adding a fence to lock() and friends on ARM, or discovering through interesting core files where else we expected accesses in critical sections to not float out from the critical section as if locking in the JVM is some kind of slippery water park.
@fisk which accesses precisely do we need a fence to keep in the right order?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21295#issuecomment-2388088945
More information about the hotspot-runtime-dev
mailing list