RFR: 8166197: assert(RelaxAssert || w != Thread::current()->_MutexEvent) failed: invariant

Thu Oct 13 00:54:31 UTC 2016

Hi Carsten,

Thanks for looking at this.

On 12/10/2016 11:21 PM, Carsten Varming wrote:
> Dear David,
>
> In line 590 "Pass onDeck to w". I don't understand this part of the
> comment. Should it say something like "Pass ownership of _OnDeck and
> _EntryList to w".

First note that that part of the comment already existed in the old code:

  589     _OnDeck = w;  // pass OnDeck to w.

This is making 'w' the OnDeck thread - nothing to do with _EntryList. 
Prior to executing this line the current thread holds the "OnDeck lock" 
- which is just a logical lock obtained by CASing _OnDeck from 0 to 1. 
The current thread selects 'w' (as the head of _EntryList) to be the new 
OnDeck thread, and passes that role to it through the assignment - and 
of course by doing so drops the "OnDeck lock". I changed to read:

     // Pass OnDeck role to w, ensuring that _EntryList has been set first.

> In line 532 there is a read of _OnDeck and in line 553 there is a read
> of _EntryList. Why is it safe not to do a load_acquire of _OnDeck in
> line 532?

<sigh> I knew I should have just stuck in a storestore() then we could 
all be blissfully ignorant :) Okay here's that code fragment with 
comments elided

  514 void Monitor::IUnlock(bool RelaxAssert) {
  529   OrderAccess::release_store(&_LockWord.Bytes[_LSBINDEX], 0); // 
drop outer lock
  530
  531   OrderAccess::storeload();
  532   ParkEvent * const w = _OnDeck;
  533   assert(RelaxAssert || w != Thread::current()->_MutexEvent, 
"invariant");
  534   if (w != NULL) {
  548     if ((UNS(w) & _LBIT) == 0) w->unpark();
  549     return;
  550   }

A release-store to X is used to ensure that shared data written prior to 
the store to X actually occurs prior to that store. A load-acquire of X 
ensures that if the load sees the value written by the store-release 
then it also sees the updates to the shared data. In the current case 
the release-store to _OnDeck writes the non-NULL value 'w', and if the 
code above sees a non-NULL value (ie it sees 'w') then at most it 
unparks 'w' and returns. It never accesses the shared data ie 
_EntryList. So no load-acquire is needed as we are not trying to 
"synchronize" with the changes to the shared state made by the thread 
that did the release-store. I've added a comment as per my response to 
Dan's email

Now lets look at all the other loads of _OnDeck:

- there are a bunch of asserts such as:

   444   assert(_OnDeck != Self->_MutexEvent, "invariant");
   588     assert(UNS(_OnDeck) == _LBIT, "invariant");
   836     assert(_OnDeck != ESelf, "invariant");
   1197 
assert((UNS(_owner)|UNS(_LockWord.FullWord)|UNS(_EntryList)|UNS(_WaitSet)|UNS(_OnDeck)) 
== 0, "");

   and a load that is only used for an assert:

   1161   uintptr_t ondeck = UNS(_OnDeck);

These are all logical checks on the current state and are not attempting 
to synchronize with any other changes to shared state, so no need for 
load-acquire. I hope there is no controversy on that point.

- there are some CAS operations which already embody full bi-directional 
fences which subsume the load-acquire (whether directly needed to 
synchronize with the store-release or not)

   466   if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL, 
UNS(ESelf)) == 0) {
   573   if (CASPTR (&_OnDeck, NULL, _LBIT) != UNS(NULL)) {

  - then we have:

    843       if (_OnDeck == ESelf && TrySpin(Self)) break;

I missed this one - it needs the load-acquire for the same reason as the 
code in ILock - which should have been obvious given:

841     // The following fragment is extracted from Monitor::ILock()

:)

Webrev updated in place - also see my response to Dan.

Thanks,
David

> Carsten
>
>
> On Wed, Oct 12, 2016 at 1:08 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Bug: https://bugs.openjdk.java.net/browse/JDK-8166197
>     <https://bugs.openjdk.java.net/browse/JDK-8166197>
>     webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/
>     <http://cr.openjdk.java.net/~dholmes/8166197/webrev/>
>
>     In IUnlock we have the following succession code to wakeup the
>     "onDeck" thread:
>
>      ParkEvent * List = _EntryList;
>       if (List != NULL) {
>         // Transfer the head of the EntryList to the OnDeck position.
>         // Once OnDeck, a thread stays OnDeck until it acquires the lock.
>         // For a given lock there is at most OnDeck thread at any one
>     instant.
>        WakeOne:
>         assert(List == _EntryList, "invariant");
>         ParkEvent * const w = List;
>         assert(RelaxAssert || w != Thread::current()->_MutexEvent,
>     "invariant");
>         _EntryList = w->ListNext;
>         // as a diagnostic measure consider setting w->_ListNext = BAD
>         assert(UNS(_OnDeck) == _LBIT, "invariant");
>         _OnDeck = w;  // pass OnDeck to w.
>
>     It is critical that the update to _EntryList happens before we set
>     _OnDeck, as as soon as _OnDeck is set the selected thread (which
>     need not yet have parked) can acquire the mutex, complete its
>     critical section and proceed to unlock the mutex, and so execute
>     IUnlock in parallel with the original thread. If the write to
>     _EntryList has not yet happened that second thread finds itself
>     still at the head of _EntryList and so the assert fires. If the
>     write to _EntryList happens after the load "List = _EntryList", then
>     the first assert can also fire.
>
>     Preferred fix today is to use OrderAccess::release_store(&_OnDeck,
>     w) with a matching load_acquire(&_OnDeck) in the ILock code:
>
>       while (_OnDeck != ESelf) {
>         ParkCommon(ESelf, 0);
>       }
>
>     and corresponding "raw" lock code. Also fixed a couple of typos.
>
>     Thanks,
>     David
>
>