RFR: 8325937: runtime/handshake/HandshakeDirectTest.java causes "monitor end should be strictly below the frame pointer" assertion failure on AArch64

Wed Oct 2 03:19:33 UTC 2024

On Tue, 1 Oct 2024 18:38:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

> Add missing StoreLoad fences in handshaking code to match safepoint code.  Thanks to @pchilano for finding this bug.
> 
> Tested with tier1-4 and tier8 which has Kitchensink in it.

I stepped through the code in one of the Neoverse-N1 machines we have and I don't see a full fence when grabbing the mutex, just acquire semantics through the casa instruction (I skipped the first instructions):

   0xfffff7eaa330 <pthread_mutex_trylock+80>:	mov	x2, x20
   0xfffff7eaa334 <pthread_mutex_trylock+84>:	mov	w1, #0x1                   	 // #1
   0xfffff7eaa338 <pthread_mutex_trylock+88>:	mov	w0, #0x0                   	// #0
   0xfffff7eaa33c <pthread_mutex_trylock+92>:	mov	w19, #0x0                   	// #0
   0xfffff7eaa340 <pthread_mutex_trylock+96>:	bl	0xfffff7eb4b80 <__aarch64_cas4_acq>
   0xfffff7eaa344 <pthread_mutex_trylock+100>:	cbnz	w0, 0xfffff7eaa730 <pthread_mutex_trylock+1104>
   0xfffff7eaa348 <pthread_mutex_trylock+104>:	mov	w0, w19
   0xfffff7eaa34c <pthread_mutex_trylock+108>:	ldp	x19, x20, [sp, #16]
   0xfffff7eaa350 <pthread_mutex_trylock+112>:	ldp	x21, x22, [sp, #32]
   0xfffff7eaa354 <pthread_mutex_trylock+116>:	ldp	x23, x24, [sp, #48]
   0xfffff7eaa358 <pthread_mutex_trylock+120>:	ldp	x29, x30, [sp], #96
   0xfffff7eaa35c <pthread_mutex_trylock+124>:	ret

   0xfffff7eb4b80 <__aarch64_cas4_acq>:	adrp	x16, 0xfffff7ed4000 <__pthread_keys+15608>
   0xfffff7eb4b84 <__aarch64_cas4_acq+4>:	ldrb	w16, [x16, #920]
   0xfffff7eb4b88 <__aarch64_cas4_acq+8>:	cbz	w16, 0xfffff7eb4b94 <__aarch64_cas4_acq+20>
   0xfffff7eb4b8c <__aarch64_cas4_acq+12>:	casa	w0, w1, [x2]
   0xfffff7eb4b90 <__aarch64_cas4_acq+16>:	ret

We could double-check the same thing in the Neoverse-N2 machine where we are seeing this issue but I think this already shows that we can't expect a full fence. In general that's how I always thought about pthread_mutex_lock/unlock as providing acquire/release semantics and not full fence [1], although in practice it probably would.

As to why we are not seeing the crash on the N1 then, my guess is the casa instruction might be implemented with stronger guarantees than what is specified, but not in the N2.

[1] https://preshing.com/20120913/acquire-and-release-semantics/

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21295#issuecomment-2387549699