RFR 8240918: [REDO] Allow direct handshakes without VMThread intervention

Mon Apr 6 14:57:49 UTC 2020

Hi Martin,

On 4/6/20 7:15 AM, Doerr, Martin wrote:
> Hi Patricio,
>
> I'm very sorry for the late review. I couldn't make it last week.
>
> Thanks a lot for taking care of the ordering semantics. I'm a big fan of having it explicit, precise, clear and well documented.
Yes, I also think it's more clear now with the explicit orderings. : )

> VM_HandshakeOneThread and VM_HandshakeAllThreads work without the barrier because HandshakeState::try_process with _processing_sem.signal() is always called?
> I think it would be better to have the explicit acquire barrier there, too. Do you agree?
I've been thinking about the VM_HandshakeOneThread and 
VM_HandshakeAllThreads cases since the other day and although I don't 
think there are issues in the use cases we have today it's probably a 
good idea to add the fence there also in case that changes. So for the 
VMThread it is safe without adding the barrier because upon returning 
from the do_it() method it will just notify the Handshaker that the 
operation has been processed and it will proceed to the main loop of 
VMThread::loop() to process the next operation. I think there could be 
an issue if we have one of this non-direct handshakes inside an 
operation that allows nested operations inside it. So when the VMThread 
returns from the handshake, it will go back to the first operation where 
it could try to read some data changed during the handshake. If the 
handshake was actually processed by the handshakee then we could have 
this issue of reading stale data.

> check_state() is pointless in your new version.
Removed.

> I've also looked at your test. Shouldn't we set -XX:+UseBiasedLocking explicitly so the test still works as it should after BiasedLocking deprecation?
Yes, thanks for noticing it. Since I'm planning to push this first I 
will change it now so I don't have to fix it later.

Here is v4 which includes also the memory ordering changes I sent before:

Full:
http://cr.openjdk.java.net/~pchilanomate/8240918/v4/webrev/
Inc:
http://cr.openjdk.java.net/~pchilanomate/8240918/v4/inc/webrev/

Thanks!

Patricio
> Best regards,
> Martin
>
>
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Montag, 6. April 2020 00:40
>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr, Martin
>> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>; hotspot-
>> runtime-dev at openjdk.java.net
>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>> VMThread intervention
>>
>> On 4/04/2020 12:57 am, Patricio Chilano wrote:
>>> On 4/3/20 11:40 AM, Doerr, Martin wrote:
>>>> Hi Patricio,
>>>>
>>>> thanks for the quick reply.
>>>>
>>>> So should we also use precise barriers in
>>>> add_target_count(int count) { Atomic::add(&_pending_threads, count,
>>>> memory_order_relaxed); }
>>>> ?
>>>> Relaxed would fit to your explanation.
>>> Exactly. I was about to send you the diff with those two changes before
>>> answering David and Dan about the acquire fence.   : )
>>>
>>>
>>> diff --git a/src/hotspot/share/runtime/handshake.cpp
>>> b/src/hotspot/share/runtime/handshake.cpp
>>> --- a/src/hotspot/share/runtime/handshake.cpp
>>> +++ b/src/hotspot/share/runtime/handshake.cpp
>>> @@ -56,7 +56,7 @@
>>>        assert(val >= 0, "_pending_threads=%d cannot be negative", val);
>>>        return val == 0;
>>>      }
>>> -  void add_target_count(int count) { Atomic::add(&_pending_threads,
>>> count); }
>>> +  void add_target_count(int count) { Atomic::add(&_pending_threads,
>>> count, memory_order_relaxed); }
>>>      bool executed() const { return _executed; }
>>>      const char* name() { return _handshake_cl->name(); }
>>>
>>> @@ -229,9 +229,8 @@
>>>      // When this is executed by the Handshakee we need a release store
>>>      // here to make sure memory operations executed in the handshake
>>>      // closure are visible to the Handshaker after it reads that the
>>> -  // operation has completed. Atomic::dec() already provides a full
>>> -  // memory fence.
>>> -  Atomic::dec(&_pending_threads);
>>> +  // operation has completed.
>>> +  Atomic::dec(&_pending_threads, memory_order_release);
>>>
>>>      // It is no longer safe to refer to 'this' as the
>>> VMThread/Handshaker may have destroyed this operation
>>>    }
>> The above seems okay. We don't generally try to optimise the barriers
>> associated with these atomic rmw ops due to the complexity of the
>> reasoning around the different types of barriers. But in this case the
>> reasoning seems quite straight-forward. The increment does not indicate
>> anything to any other thread, it simply sets the right value in the
>> counter (which may already have been decremented). The decrement
>> indicates a completion token and so needs at least a release-barrier.
>> The final check that watches for the count to be zero, needs the
>> matching acquire barrier.
>>
>> Thanks,
>> David
>> -----
>>
>>> Thanks Martin!
>>>
>>> Patricio
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>>> Sent: Freitag, 3. April 2020 16:33
>>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>>> <david.holmes at oracle.com>; Robbin Ehn <robbin.ehn at oracle.com>;
>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>> VMThread intervention
>>>>>
>>>>> Hi Martin,
>>>>>
>>>>> On 4/3/20 11:20 AM, Doerr, Martin wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> the acquire belongs to the load of _pending_threads in is_completed()
>>>>> (handshake.cpp:270).
>>>>>> It's moved below the loop because we don't need to repeat the
>> barrier.
>>>>>> The release barrier is integrated in Atomic::dec(&_pending_threads)
>>>>> (handshake.cpp:234).
>>>>>> Should we better use Atomic::dec(&_pending_threads,
>>>>> memory_order_release) to make it precise and explicit?
>>>>> As you stated in your other email too we can specify explicit memory
>>>>> ordering to Atomic::dec() which I forgot. I agree with explicitly
>>>>> changing it to Atomic::dec(&_pending_threads,
>> memory_order_release).
>>>>>> Release / acquire semantics are exactly what I was missing in the
>>>>>> earlier
>>>>> version.
>>>>>> I don't understand why "add_target_count" is called after the
>>>>>> handshakes
>>>>> were activated (handshake.cpp:171).
>>>>>> Couldn't it happen that the first handshake completes before that
>>>>>> (it would
>>>>> set _pending_threads to 0)?
>>>>> Yes, _pending_threads can even go negative for the
>>>>> VM_HandshakeAllThreads case. But we only check _pending_threads in
>> the
>>>>> VMThread so for the Handshakees it doesn't matter when they
>> decrement
>>>>> the counter what's the actual value.
>>>>>
>>>>> Thanks,
>>>>> Patricio
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>> Sent: Freitag, 3. April 2020 15:22
>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr,
>>>>>>> Martin
>>>>>>> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>;
>>>>> hotspot-
>>>>>>> runtime-dev at openjdk.java.net
>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>>>> VMThread intervention
>>>>>>>
>>>>>>> Hi Patricio,
>>>>>>>
>>>>>>> On 3/04/2020 6:00 pm, Patricio Chilano wrote:
>>>>>>>> Hi Martin,
>>>>>>>>
>>>>>>>> On 4/2/20 5:24 PM, Doerr, Martin wrote:
>>>>>>>>> Hi Patricio,
>>>>>>>>>
>>>>>>>>> right, you got my point. Sorry for not explaining it in greater
>>>>>>>>> detail
>>>>>>>>> at the beginning. I was busy with other things.
>>>>>>>>>
>>>>>>>>> There should be a memory barrier ordering read of
>> _pending_threads
>>>>>>>>> (which is observed as 0) and whatever gets done when returning
>> from
>>>>>>>>> execute_direct.
>>>>>>>>> I don't like relying on barriers which are hidden somewhere and
>>>>>>>>> may be
>>>>>>>>> subject to change.
>>>>>>>> Yes, I agree with that. I added an explicit acquire fence at the
>>>>>>>> end of
>>>>>>>> Handshake::execute_direct() with a comment. Thanks for noticing it.
>>>>>>> This is unclear to me:
>>>>>>>
>>>>>>> 281   // Prevent future loads from floating above the load of
>>>>>>> _pending_threads
>>>>>>>      282   // in is_completed(). This prevents reading stale data
>>>>>>> modified
>>>>>>> in the
>>>>>>>      283   // handshake closure by the Handshakee.
>>>>>>>      284   OrderAccess::acquire();
>>>>>>>      285
>>>>>>>      286   return op.executed();
>>>>>>>
>>>>>>> An acquire() by itself has no meaning, it has to be associated with a
>>>>>>> release() and both have to be associated with the load, respectively,
>>>>>>> store, of the same variable. From the comment it sounds like you just
>>>>>>> want a loadload() barrier here.
>>>>>>>
>>>>>>> Other updates seem fine.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>>> I'd also prefer explicit barriers on the writers side over the
>>>>>>>>> implicit ones from Atomic::add, because explicit barriers with
>>>>>>>>> comments help a lot understanding the code.
>>>>>>>>> (Not sure if you want to change that. Just my opinion.)
>>>>>>>> I added a comment on top of Atomic::dec() on why we need a
>> memory
>>>>>>>> barrier there and the fact that we are already using a full memory
>>>>>>>> fence.
>>>>>>>> If you still think I should add an explicit barrier for code
>>>>>>>> readability
>>>>>>>> I can add it though.
>>>>>>>>
>>>>>>>>
>>>>>>>> Here is v3 which also addresses Dan's comments.
>>>>>>>>
>>>>>>>> Full:
>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/webrev/
>>>>>>>> Inc:
>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/inc/webrev/
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Patricio
>>>>>>>>> Best regards,
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>>>>>>>> Sent: Donnerstag, 2. April 2020 19:56
>>>>>>>>>> To: Robbin Ehn <robbin.ehn at oracle.com>; Doerr, Martin
>>>>>>>>>> <martin.doerr at sap.com>; hotspot-runtime-
>> dev at openjdk.java.net
>>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
>> without
>>>>>>>>>> VMThread intervention
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 4/2/20 1:16 PM, Patricio Chilano wrote:
>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for looking at this.
>>>>>>>>>>>
>>>>>>>>>>> On 4/2/20 10:15 AM, Robbin Ehn wrote:
>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>
>>>>>>>>>>>> On 2020-04-02 14:55, Doerr, Martin wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> first of all, thanks, Patricio, for doing this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> One question:
>>>>>>>>>>>>> _pending_threads is used for synchronizing threads.
>>>>>>>>>>>>> Don't we need release / acquire semantics?
>>>>>>>>>>>>> I guess otherwise handshaker can read stale data from
>>>>> handshakee
>>>>>>>>>>>>> after observing is_completed.
>>>>>>>>>>>>> Or did I miss anything?
>>>>>>>>>>>> Reading stale should be fine in this case, since the counter goes
>>>>>>>>>>>> from n->0, thus you can only get false negatives from "bool
>>>>>>>>>>>> is_completed()" ?
>>>>>>>>>>> Right, as Robbin points out at most we will just have to loop
>>>>>>>>>>> again.
>>>>>>>>>>> Also, when the handshakee processes the handshake, right
>> after
>>>>>>>>>>> updating _pending_threads, it will either signal the
>>>>>>>>>>> _processing_sem
>>>>>>>>>>> if there are no direct operations, or it will signal the
>>>>>>>>>>> _handshake_turn_sem, so that will provide the release already.
>> The
>>>>>>>>>>> next read in the loop should read the updated value. Are you
>> okay
>>>>> with
>>>>>>>>>>> that?
>>>>>>>>>> I realize you might be thinking of a different case now. You
>>>>>>>>>> mean the
>>>>>>>>>> handshaker could execute loads from code after
>>>>>>>>>> Handshake::execute_direct() while it hasn't yet read
>>>>> _pending_threads
>>>>>>>>>> from is_completed()? Although there is a load_acquire() in
>>>>>>>>>> ~ThreadsListHandle() that will prevent those issues you might
>> argue
>>>>> that
>>>>>>>>>> we can't rely on code which might change.
>>>>>>>>>> I think (not counting that load_acquire from ~ThreadsListHandle)
>>>>>>>>>> the
>>>>>>>>>> issue could appear for example if you created a ThreadsListHandle
>>>>> first,
>>>>>>>>>> then called Handshake::execute_direct(), and when returning you
>> try
>>>>> to
>>>>>>>>>> read some field from the handshakee (you might want to assert
>>>>>>> something
>>>>>>>>>> for example). If there wasn't any ThreadsListHandle created
>> before
>>>>>>>>>> calling execute_direct() I think you are not allowed to access
>>>>>>>>>> other
>>>>>>>>>> JavaThread's data anyway since the thread might have exited
>>>>>>>>>> already,
>>>>>>>>>> unless you create a ThreadsListHandle which will already provide
>>>>>>>>>> the
>>>>>>>>>> acquire. So I think you are right and we might need a fence before
>>>>>>>>>> returning from execute_direct() or add a comment that we are
>>>>>>>>>> relying
>>>>>>> on
>>>>>>>>>> the load_acquire from ~ThreadsListHandle(). Is that what you
>> were
>>>>>>>>>> thinking?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Patricio
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>> Patricio
>>>>>>>>>>>> Thanks, Robbin
>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>>>>>>>>>>>>> bounces at openjdk.java.net> On Behalf Of Robbin Ehn
>>>>>>>>>>>>>> Sent: Donnerstag, 2. April 2020 13:36
>>>>>>>>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>;
>>>>>>>>>>>>>> hotspot-runtime-
>>>>>>>>>>>>>> dev at openjdk.java.net
>>>>>>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
>>>>> without
>>>>>>>>>>>>>> VMThread intervention
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Patricio,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for fixing, looks good.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /Robbin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2020-04-01 19:45, Patricio Chilano wrote:
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please review this redo for 8230594. Since the reason
>> behind
>>>>> the
>>>>>>>>>>>>>>> Windows
>>>>>>>>>>>>>>> timeouts turned out to be 8240902, this patch is almost
>>>>>>>>>>>>>>> identical
>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>> reverse of the backout modulo the rebase.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There is only one small fix for an issue not identified in the
>>>>>>>>>>>>>>> original
>>>>>>>>>>>>>>> patch which I explained in the comments. To avoid this
>>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>>> issue of
>>>>>>>>>>>>>>> deleting a JavaThread that called Thread::destroy_vm()
>> while
>>>>>>> there
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> handshakers blocked in the handshake_turn_sem
>> semaphore,
>>>>> I
>>>>>>>>>> added a
>>>>>>>>>>>>>> check
>>>>>>>>>>>>>>> to verify the JavaThread in question is not protected by
>> some
>>>>>>>>>>>>>>> ThreadsList reference before attempting the delete it. In
>> case
>>>>>>>>>>>>>>> it is
>>>>>>>>>>>>>>> protected, we need to at least clear the value for
>> _thread_key
>>>>> to
>>>>>>>>>>>>>>> avoid
>>>>>>>>>>>>>>> possible infinite loops at thread exit (this can happen in
>>>>>>>>>>>>>>> Solaris).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The only other change is that I removed the check for local
>>>>>>>>>>>>>>> polls in
>>>>>>>>>>>>>>> Handshake::execute_direct() since after the backout,
>> thread-
>>>>> local
>>>>>>>>>>>>>>> handshakes was implemented for arm32 which was the
>> only
>>>>>>>>>> remaining
>>>>>>>>>>>>>> platform.
>>>>>>>>>>>>>>> I tested it several times in mach5 tiers 1-6 and once in t7
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> saw no
>>>>>>>>>>>>>>> failures.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Bug:
>>>>>>>>>>>>>>>          https://bugs.openjdk.java.net/browse/JDK-8240918
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>>
>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v1/webrev/
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Patricio