RFR: 8266557: assert(SafepointMechanism::local_poll_armed(_handshakee)) failed: Must be
David Holmes
david.holmes at oracle.com
Wed May 19 05:53:01 UTC 2021
On 18/05/2021 6:25 pm, Robbin Ehn wrote:
> On Tue, 11 May 2021 11:59:23 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
>> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-runtime-dev](mailto:hotspot-runtime-dev at mail.openjdk.java.net):_
>>
>> Hi Robbin,
>>
>> On 17/05/2021 11:02 pm, Robbin Ehn wrote:
>>
>>> Please consider this change-set which address the issue on hand.
>>> I identified two problems:
>>> - is_locked() uses the _owner field which is unordered (no storestore|storeload) on store-side.
>>> Fixed by leaving the handshakee being processed in queue until completed.
>>
>> Sorry but I'm not seeing how leaving or removing the op from the queue
>> impacts things here. Which thread is reading the queue and seeing a
>> wrong value?
>
> When poll is armed the handshakee must go to slow path and lock the HandshakeState lock if:
> - 1: There is handshake operation to be processed
> - 2: Some-else is in the middle of processing a handshake operation for this handshakee
>
> By having having a non-empty queue if 1 or 2 is true, the handshakee can just ask if there is a non-empty queue.
> If so go to slow path and grab the HandshakeState lock.
>
> Now 1 is checked with queue head and 2 is checked with is_locked().
>
> But since the thread locking, storing _owner, can do this unordered, it may store the owner field after it have read our thread state, e.g. blocked.
Okay ... I'm trying to decide if we have a bug in our Mutex
implementation, or whether the use of is_locked() is incorrect. The
problem is that checking _owner is not sufficient to know if a Mutex is
locked because that is a stand-alone action that happens after the
actual locking takes place. Even if we fixed the ordering issue by
performing a fence() after storing _owner, that would not fix the basic
problem. So any use of Mutex::is_locked() has to be treated with
suspicion as it can return false when the Mutex actually is locked.
So setting that aside I'm happy to see should_process() and the use of
is_locked() disappear, and I will re-examine the PR with a fresh
perspective. :)
Thanks,
David
-----
> What we do in three different scenarios are:
> Handshaker adds an operation:
> store queue
> store poll
>
> Handshaker processing do:
> store _owner (lock)
> load queue
> load poll
> load thread state
>
> Handshakee polling path do:
> store state
> load poll
> load owner
> load queue
>
> ############################################
> With the un-ordered characteristics of _owner field, it looks like it could play out like this:
>
> Handshakee: state blocked
> Handshaker A processor: add: store queue OP Z
> Handshaker B processor: processes OP Z and store queue => NULL
> Handshaker A processor: add: store poll
> Handshaker A processor: LOCK HandshakeState lock (_owner is still NULL)
> Handshaker C processor: add: store queue OP X
> Handshaker A processor: load queue (sees X)
> Handshaker A processor: load poll (armed)
> Handshaker A processor: load thread state // OK blocked
> Handshakee T target : store state // unsafe
> Handshakee T target : load poll // armed
> Handshakee T target : load owner // NULL
> Handshaker A processor: store _owner (non-NULL) // re-ordered here !!
> Handshaker A processor: store queue (NULL)
> Handshakee T target : load queue // NULL
> Handshakee T target : disarm // ERROR !!!!
>
>>
>> Thanks,
>> David
>> -----
>
> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/3973
>
More information about the hotspot-runtime-dev
mailing list