RFR: 8266557: assert(SafepointMechanism::local_poll_armed(_handshakee)) failed: Must be

David Holmes david.holmes at oracle.com
Wed May 19 05:53:01 UTC 2021


On 18/05/2021 6:25 pm, Robbin Ehn wrote:
> On Tue, 11 May 2021 11:59:23 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
>> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-runtime-dev](mailto:hotspot-runtime-dev at mail.openjdk.java.net):_
>>
>> Hi Robbin,
>>
>> On 17/05/2021 11:02 pm, Robbin Ehn wrote:
>>
>>> Please consider this change-set which address the issue on hand.
>>> I identified two problems:
>>> - is_locked() uses the _owner field which is unordered (no storestore|storeload) on store-side.
>>> Fixed by leaving the handshakee being processed in queue until completed.
>>
>> Sorry but I'm not seeing how leaving or removing the op from the queue
>> impacts things here. Which thread is reading the queue and seeing a
>> wrong value?
> 
> When poll is armed the handshakee must go to slow path and lock the HandshakeState lock if:
> - 1: There is handshake operation to be processed
> - 2: Some-else is in the middle of processing a handshake operation for this handshakee
> 
> By having having a non-empty queue if 1 or 2 is true, the handshakee can just ask if there is a non-empty queue.
> If so go to slow path and grab the HandshakeState lock.
> 
> Now 1 is checked with queue head and 2 is checked with is_locked().
> 
> But since the thread locking, storing _owner, can do this unordered, it may store the owner field after it have read our thread state, e.g. blocked.

Okay ... I'm trying to decide if we have a bug in our Mutex 
implementation, or whether the use of is_locked() is incorrect. The 
problem is that checking _owner is not sufficient to know if a Mutex is 
locked because that is a stand-alone action that happens after the 
actual locking takes place. Even if we fixed the ordering issue by 
performing a fence() after storing _owner, that would not fix the basic 
problem. So any use of Mutex::is_locked() has to be treated with 
suspicion as it can return false when the Mutex actually is locked.

So setting that aside I'm happy to see should_process() and the use of 
is_locked() disappear, and I will re-examine the PR with a fresh 
perspective. :)

Thanks,
David
-----

> What we do in three different scenarios are:
> Handshaker adds an operation:
> store queue
> store poll
> 
> Handshaker processing do:
> store _owner (lock)
> load queue
> load poll
> load thread state
> 
> Handshakee polling path do:
> store state
> load poll
> load owner
> load queue
> 
> ############################################
> With the un-ordered characteristics of _owner field, it looks like it could play out like this:
> 
> Handshakee: state blocked
> Handshaker A processor: add: store queue OP Z
> Handshaker B processor: processes OP Z and store queue => NULL
> Handshaker A processor: add: store poll
> Handshaker A processor: LOCK HandshakeState lock (_owner is still NULL)
> Handshaker C processor: add: store queue OP X
> Handshaker A processor: load queue (sees X)
> Handshaker A processor: load poll (armed)
> Handshaker A processor: load thread state // OK blocked
> Handshakee T target   : store state // unsafe
> Handshakee T target   : load poll // armed
> Handshakee T target   : load owner // NULL
> Handshaker A processor: store _owner (non-NULL) // re-ordered here !!
> Handshaker A processor: store queue (NULL)
> Handshakee T target   : load queue // NULL
> Handshakee T target   : disarm // ERROR !!!!
> 
>>
>> Thanks,
>> David
>> -----
> 
> -------------
> 
> PR: https://git.openjdk.java.net/jdk/pull/3973
> 


More information about the hotspot-runtime-dev mailing list