RFR: 8325937: runtime/handshake/HandshakeDirectTest.java causes "monitor end should be strictly below the frame pointer" assertion failure on AArch64 [v4]

Thu Oct 24 05:24:14 UTC 2024

On Fri, 4 Oct 2024 14:50:09 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Add missing StoreLoad fences in handshaking code to match safepoint code.  Thanks to @pchilano for finding this bug.
>> 
>> Tested with tier1-4 and tier8 which has Kitchensink in it.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Put suspend handshake fence() where Robbin suggested.

Let me try to belatedly correct the record here, in case I didn't overlook a later discussion somewhere else:

> > Yes, that's right. Code in critical sections isn't racy, by definition.
> 
> I suppose it depends on what we mean by racy. The critical sections are atomic if synchronized appropriately, but with only acquire/release semantics (what glibc seemingly does in practice), they might for example be observed to happen in inconsistent orders. I'll try to refrain myself from talking about the solution domain for now, but rather look at the problem domain. Consider for example the following mutex IRIW litmus test:
> 
> T1: synchronized (m1) { *v1 = 1; } T2: synchronized (m2) { *v2 = 1; } T3: synchronized (m1) { l1 = *v1; } synchronized (m2) { l2 = *v2; } T4: synchronized (m2) { r2 = *v2; } synchronized (m1) { r1 = *v1; }
> 
> In this example, we have two global variables v1 and v2 that has a corresponding mutex me and m2. The global variables are read and written atomically under the corresponding lock. But with only acquire/release semantics in the lock, it is possible for the mutations from T1 and T2 to be observed to happen in inconsistent orders such as { l1: 1, l2: 0, r1: 0, r2: 1 }. It would then appear to T3 that the atomic mutation of T1 happened-before the atomic mutation of T2, while from the point of view of T4, it would appear that the opposite happened.
>
This is thankfully, and importantly, incorrect. C++ programs with only acquire/release locks and seq_cst atomics exhibit sequential consistency. (This program doesn't even contain any atomics.) See for example Theorem 7.1 in https://dl.acm.org/doi/pdf/10.1145/1375581.1375591 You need weakly ordered atomics to tell whether lock acquisitions contain fences. C++ does not require lock acquisitions to be fenced precisely because that would slow down code that does not use weakly ordered atomics, with no visible benefit to such code. C++ does move code with weakly ordered atomics even a bit further towards the extreme end of the difficulty scale. But I still believe this is clearly the correct trade-off.

In this specific example, with this specific outcome, we know that all of T3 happens-before T2, which happens before T4, all of which happens before T1, which happens before all of T3, which is a happens-before cycle. Thus this is not an allowable execution. In each case, the values reaad determine the order in which the m1 and m2 critical sections are executed, and those lock acquisitions give rise to the given happens-before relationships.

The other example is similar.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21295#issuecomment-2434325660