RFR: 8266557: assert(SafepointMechanism::local_poll_armed(_handshakee)) failed: Must be [v3]

Robbin Ehn rehn at openjdk.java.net
Wed Jun 2 06:37:32 UTC 2021


On Tue, 1 Jun 2021 19:20:59 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

> Hi Robbin,
> 
> I think your patch looks good and fixes the issue with is_locked().
> Based on the root cause it seems another simpler solution would have been to add a fence between claim_handshake() and can_process_handshake() to make sure the target sees the write to _owner before we read a handshake safe state. If the issue is that _owner should be only use for debug builds or that the fence should have been in Mutex class and we don't want to affect all other cases, then we could use _active_handshaker instead of _owner. That way we could also keep the while loop in try_process() to process multiple operations at a time, and don't worry about the peek vs pop issue.
> 
> Thanks,
> Patricio

Thanks for having a look.

The main reason for not just adding the fence or use another state is that it leaves us with three different variables.
Since keeping track of three states, poll, queue and owner turn out to be a handful, reducing it back to two states I hope we can easier reason about the code.
Thus having queue non-empty (which was also suggested instead of using is_locked when it was introduced) while executing is the simplest model (but a bigger change).
This also mirrors safepoints: poll, global flag => wait barrier
With                                     : poll, queue => HandshakeState lock

Since there is no performance benefit of looping, except in some test which do millions of them in a few seconds.
If we need looping when removing the executed OP, returning if the op was removed with CAS on _first with NULL we must abort loop, otherwise we can continue.

But removing the looping also simplifies the model and removes many scenarios which no longer needs to reason about.
Again this bug shows that we should simplify, so more people can reason about if this is correct or not.

Thanks Robbin

-------------

PR: https://git.openjdk.java.net/jdk/pull/3973


More information about the hotspot-runtime-dev mailing list