RFR: 8325937: runtime/handshake/HandshakeDirectTest.java causes "monitor end should be strictly below the frame pointer" assertion failure on AArch64
Erik Österlund
eosterlund at openjdk.org
Thu Oct 3 14:34:41 UTC 2024
On Thu, 3 Oct 2024 09:11:26 GMT, Andrew Haley <aph at openjdk.org> wrote:
> I guess I must confess to my real motivation here.
>
> We have a sort-of-knee-jerk reaction to finding concurrency bugs, sometimes caused by a misunderstanding of primitive sematics, of sprinkling fences around just in case rather than fixing the mistakes. I guess the main motivation is fear: we just don't know what is lurking here. But all that does is paper over the cracks, and it makes it very hard to reason about the code. There is no way to know whether some logic relies on a side effect of locking. It's much better _for the reader_ if we make all of this explicit, rather than implied.
I totally agree with you that we need the semantics to be made explicit and clear so we are not all assuming different things, introducing more of these kind of bugs. In this discussion we have seen a wide range of assumptions about what the semantics of pthread_mutex_lock is, ranging from acquire(), to some sort of SC interpretation to full fence(). And nobody is seemingly right or wrong as the spec is rather vague and leaves a lot to our imagination. So the "misunderstanding" is not surprising, when it hasn't been specified.
In terms of what the implementation actually does though, it seems like glibc pthread_mutex currently uses acquire/release only just like std::mutex. That seemingly blows up the roach hotel model, SC and fence expectations all together. So if the path forward is to retroactively accept what the rather relaxed glibc implementation gives us as the "correct" contract of what locking should do, and dismiss anything else is "misunderstanding", then we can say goodbye to expectations a lot of people have had when building HotSpot over the years. I think that seems a bit reckless.
If we do go in that direction, I would hope we could get there in a more incremental fashion after actually looking at the code and reasoning a bit about the implications. So for now, let's just add Coleen's fence. But I think we really ought to think very carefully about what we decide to do about this going forward. I'd personally like to think about it a bit as the implications are far from obvious to me.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21295#issuecomment-2391577902
More information about the hotspot-runtime-dev
mailing list