RFR: 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent

Wed May 28 15:10:56 UTC 2025

On Wed, 28 May 2025 08:49:17 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

> The optimized fast_aputfield bytecode on AArch64 stores the field flags in r3, and performs the leading and trailing fencing depending on its volatile bit being set or not. However, r3 is also the last temp register passed in to the barrier set for reference stores, and G1 clobbers it in a way that may clear the volatile bit. Then the trailing fence won't get executed, and sequential consistency is broken.
> 
> My fix puts the flags in r5 instead, which is the register that was used by normal aputfield bytecodes. This way, barriers don't clobber the volatile bits.
> 
> This bug has been observed to mess up a classic Dekker duality in the java.util.concurrent.Exchanger class, leading to a hang in the test/jdk/java/util/concurrent/Exchanger/ExchangeLoops.java test that exercises it. Using G1 and -Xint a reproducer hangs 30/100 times in mach5. With the fix, the same reproducer hangs 0/100 times.

> Perhaps confusingly, this only reproduces when I supply `-sc false`. A "normal" way for jcstress to separately compile/interpret methods is via compiler control, this is what `-sc true` (default) does. Which, I think, accidentally passes due to Erik's comment above: [#25483 (comment)](https://github.com/openjdk/jdk/pull/25483#issuecomment-2916521838) -- we still interpret in the mode that have extra fence. WIth `-sc false`, we have a more blunt `-Xint`, `-XX:TieredStopAtLevel=1` gets used and is seen to fail.

Thanks for checking.

> In retrospect, I think conditionalizing barrier emit scheme on the presence of particular compilers is counter-productive, especially when the default behavior (C2 is enabled) is to emit the barriers. In this instance, this would have simplified testing a bit, and maybe this bug less of a nuisance :)

Yeah I got a bit confused about that too. On the one hand side it looks like a weird optimization for a mode of execution that seems to care less about optimizations. But I suppose the reason for making it conditional might have rather been to be more precise about what the actual constraints are and not try to conservatively mask cases that absolutely should not need the fence for correctness. Then we want to know if our understanding of what we need for correctness is off, then something is very wrong, which fortunately we now found that it was indeed. Otherwise we would probably never have noticed it, but it would still arguably be wrong. For example if the store is interpreted (but with bugged out missing trailing fence) and the load is C2 compiled, there is still a problem, right? Just less likely to happen to get that kind of mixed execution in code exercising the races.

> Test starts to pass with the patch from this PR.

Awesome.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25483#issuecomment-2916691724