RFR 8240918: [REDO] Allow direct handshakes without VMThread intervention

Mon Apr 6 17:49:31 UTC 2020

Hi Dan,

On 4/6/20 2:26 PM, Daniel D. Daugherty wrote:
>> http://cr.openjdk.java.net/~pchilanomate/8240918/v4/inc/webrev/ 
>
> src/hotspot/share/runtime/handshake.cpp
>     I'm now good with the memory syncing between the release-store in
>     do_handshake() and acquire() in either VM_HandshakeOneThread::doit()
>     Handshake::execute_direct(). Thanks for updating the comments also.
>
> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>     No comments.
>
> Thumbs up!
Thanks for reviewing this!

Patricio
> Dan
>
>
>
> On 4/6/20 10:57 AM, Patricio Chilano wrote:
>> Hi Martin,
>>
>> On 4/6/20 7:15 AM, Doerr, Martin wrote:
>>> Hi Patricio,
>>>
>>> I'm very sorry for the late review. I couldn't make it last week.
>>>
>>> Thanks a lot for taking care of the ordering semantics. I'm a big 
>>> fan of having it explicit, precise, clear and well documented.
>> Yes, I also think it's more clear now with the explicit orderings. : )
>>
>>> VM_HandshakeOneThread and VM_HandshakeAllThreads work without the 
>>> barrier because HandshakeState::try_process with 
>>> _processing_sem.signal() is always called?
>>> I think it would be better to have the explicit acquire barrier 
>>> there, too. Do you agree?
>> I've been thinking about the VM_HandshakeOneThread and 
>> VM_HandshakeAllThreads cases since the other day and although I don't 
>> think there are issues in the use cases we have today it's probably a 
>> good idea to add the fence there also in case that changes. So for 
>> the VMThread it is safe without adding the barrier because upon 
>> returning from the do_it() method it will just notify the Handshaker 
>> that the operation has been processed and it will proceed to the main 
>> loop of VMThread::loop() to process the next operation. I think there 
>> could be an issue if we have one of this non-direct handshakes inside 
>> an operation that allows nested operations inside it. So when the 
>> VMThread returns from the handshake, it will go back to the first 
>> operation where it could try to read some data changed during the 
>> handshake. If the handshake was actually processed by the handshakee 
>> then we could have this issue of reading stale data.
>>
>>> check_state() is pointless in your new version.
>> Removed.
>>
>>> I've also looked at your test. Shouldn't we set 
>>> -XX:+UseBiasedLocking explicitly so the test still works as it 
>>> should after BiasedLocking deprecation?
>> Yes, thanks for noticing it. Since I'm planning to push this first I 
>> will change it now so I don't have to fix it later.
>>
>>
>> Here is v4 which includes also the memory ordering changes I sent 
>> before:
>>
>> Full:
>> http://cr.openjdk.java.net/~pchilanomate/8240918/v4/webrev/
>> Inc:
>> http://cr.openjdk.java.net/~pchilanomate/8240918/v4/inc/webrev/
>>
>>
>> Thanks!
>>
>> Patricio
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Montag, 6. April 2020 00:40
>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr, 
>>>> Martin
>>>> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>; hotspot-
>>>> runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>> VMThread intervention
>>>>
>>>> On 4/04/2020 12:57 am, Patricio Chilano wrote:
>>>>> On 4/3/20 11:40 AM, Doerr, Martin wrote:
>>>>>> Hi Patricio,
>>>>>>
>>>>>> thanks for the quick reply.
>>>>>>
>>>>>> So should we also use precise barriers in
>>>>>> add_target_count(int count) { Atomic::add(&_pending_threads, count,
>>>>>> memory_order_relaxed); }
>>>>>> ?
>>>>>> Relaxed would fit to your explanation.
>>>>> Exactly. I was about to send you the diff with those two changes 
>>>>> before
>>>>> answering David and Dan about the acquire fence.   : )
>>>>>
>>>>>
>>>>> diff --git a/src/hotspot/share/runtime/handshake.cpp
>>>>> b/src/hotspot/share/runtime/handshake.cpp
>>>>> --- a/src/hotspot/share/runtime/handshake.cpp
>>>>> +++ b/src/hotspot/share/runtime/handshake.cpp
>>>>> @@ -56,7 +56,7 @@
>>>>>        assert(val >= 0, "_pending_threads=%d cannot be negative", 
>>>>> val);
>>>>>        return val == 0;
>>>>>      }
>>>>> -  void add_target_count(int count) { Atomic::add(&_pending_threads,
>>>>> count); }
>>>>> +  void add_target_count(int count) { Atomic::add(&_pending_threads,
>>>>> count, memory_order_relaxed); }
>>>>>      bool executed() const { return _executed; }
>>>>>      const char* name() { return _handshake_cl->name(); }
>>>>>
>>>>> @@ -229,9 +229,8 @@
>>>>>      // When this is executed by the Handshakee we need a release 
>>>>> store
>>>>>      // here to make sure memory operations executed in the handshake
>>>>>      // closure are visible to the Handshaker after it reads that the
>>>>> -  // operation has completed. Atomic::dec() already provides a full
>>>>> -  // memory fence.
>>>>> -  Atomic::dec(&_pending_threads);
>>>>> +  // operation has completed.
>>>>> +  Atomic::dec(&_pending_threads, memory_order_release);
>>>>>
>>>>>      // It is no longer safe to refer to 'this' as the
>>>>> VMThread/Handshaker may have destroyed this operation
>>>>>    }
>>>> The above seems okay. We don't generally try to optimise the barriers
>>>> associated with these atomic rmw ops due to the complexity of the
>>>> reasoning around the different types of barriers. But in this case the
>>>> reasoning seems quite straight-forward. The increment does not 
>>>> indicate
>>>> anything to any other thread, it simply sets the right value in the
>>>> counter (which may already have been decremented). The decrement
>>>> indicates a completion token and so needs at least a release-barrier.
>>>> The final check that watches for the count to be zero, needs the
>>>> matching acquire barrier.
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> Thanks Martin!
>>>>>
>>>>> Patricio
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>>>>> Sent: Freitag, 3. April 2020 16:33
>>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>>>>> <david.holmes at oracle.com>; Robbin Ehn <robbin.ehn at oracle.com>;
>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>>>> VMThread intervention
>>>>>>>
>>>>>>> Hi Martin,
>>>>>>>
>>>>>>> On 4/3/20 11:20 AM, Doerr, Martin wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> the acquire belongs to the load of _pending_threads in 
>>>>>>>> is_completed()
>>>>>>> (handshake.cpp:270).
>>>>>>>> It's moved below the loop because we don't need to repeat the
>>>> barrier.
>>>>>>>> The release barrier is integrated in 
>>>>>>>> Atomic::dec(&_pending_threads)
>>>>>>> (handshake.cpp:234).
>>>>>>>> Should we better use Atomic::dec(&_pending_threads,
>>>>>>> memory_order_release) to make it precise and explicit?
>>>>>>> As you stated in your other email too we can specify explicit 
>>>>>>> memory
>>>>>>> ordering to Atomic::dec() which I forgot. I agree with explicitly
>>>>>>> changing it to Atomic::dec(&_pending_threads,
>>>> memory_order_release).
>>>>>>>> Release / acquire semantics are exactly what I was missing in the
>>>>>>>> earlier
>>>>>>> version.
>>>>>>>> I don't understand why "add_target_count" is called after the
>>>>>>>> handshakes
>>>>>>> were activated (handshake.cpp:171).
>>>>>>>> Couldn't it happen that the first handshake completes before that
>>>>>>>> (it would
>>>>>>> set _pending_threads to 0)?
>>>>>>> Yes, _pending_threads can even go negative for the
>>>>>>> VM_HandshakeAllThreads case. But we only check _pending_threads in
>>>> the
>>>>>>> VMThread so for the Handshakees it doesn't matter when they
>>>> decrement
>>>>>>> the counter what's the actual value.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Patricio
>>>>>>>> Best regards,
>>>>>>>> Martin
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>>>> Sent: Freitag, 3. April 2020 15:22
>>>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr,
>>>>>>>>> Martin
>>>>>>>>> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>;
>>>>>>> hotspot-
>>>>>>>>> runtime-dev at openjdk.java.net
>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>>>>>> VMThread intervention
>>>>>>>>>
>>>>>>>>> Hi Patricio,
>>>>>>>>>
>>>>>>>>> On 3/04/2020 6:00 pm, Patricio Chilano wrote:
>>>>>>>>>> Hi Martin,
>>>>>>>>>>
>>>>>>>>>> On 4/2/20 5:24 PM, Doerr, Martin wrote:
>>>>>>>>>>> Hi Patricio,
>>>>>>>>>>>
>>>>>>>>>>> right, you got my point. Sorry for not explaining it in greater
>>>>>>>>>>> detail
>>>>>>>>>>> at the beginning. I was busy with other things.
>>>>>>>>>>>
>>>>>>>>>>> There should be a memory barrier ordering read of
>>>> _pending_threads
>>>>>>>>>>> (which is observed as 0) and whatever gets done when returning
>>>> from
>>>>>>>>>>> execute_direct.
>>>>>>>>>>> I don't like relying on barriers which are hidden somewhere and
>>>>>>>>>>> may be
>>>>>>>>>>> subject to change.
>>>>>>>>>> Yes, I agree with that. I added an explicit acquire fence at the
>>>>>>>>>> end of
>>>>>>>>>> Handshake::execute_direct() with a comment. Thanks for 
>>>>>>>>>> noticing it.
>>>>>>>>> This is unclear to me:
>>>>>>>>>
>>>>>>>>> 281   // Prevent future loads from floating above the load of
>>>>>>>>> _pending_threads
>>>>>>>>>      282   // in is_completed(). This prevents reading stale data
>>>>>>>>> modified
>>>>>>>>> in the
>>>>>>>>>      283   // handshake closure by the Handshakee.
>>>>>>>>>      284   OrderAccess::acquire();
>>>>>>>>>      285
>>>>>>>>>      286   return op.executed();
>>>>>>>>>
>>>>>>>>> An acquire() by itself has no meaning, it has to be associated 
>>>>>>>>> with a
>>>>>>>>> release() and both have to be associated with the load, 
>>>>>>>>> respectively,
>>>>>>>>> store, of the same variable. From the comment it sounds like 
>>>>>>>>> you just
>>>>>>>>> want a loadload() barrier here.
>>>>>>>>>
>>>>>>>>> Other updates seem fine.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>>> I'd also prefer explicit barriers on the writers side over the
>>>>>>>>>>> implicit ones from Atomic::add, because explicit barriers with
>>>>>>>>>>> comments help a lot understanding the code.
>>>>>>>>>>> (Not sure if you want to change that. Just my opinion.)
>>>>>>>>>> I added a comment on top of Atomic::dec() on why we need a
>>>> memory
>>>>>>>>>> barrier there and the fact that we are already using a full 
>>>>>>>>>> memory
>>>>>>>>>> fence.
>>>>>>>>>> If you still think I should add an explicit barrier for code
>>>>>>>>>> readability
>>>>>>>>>> I can add it though.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is v3 which also addresses Dan's comments.
>>>>>>>>>>
>>>>>>>>>> Full:
>>>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/webrev/
>>>>>>>>>> Inc:
>>>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/inc/webrev/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Patricio
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>>>>>>>>>> Sent: Donnerstag, 2. April 2020 19:56
>>>>>>>>>>>> To: Robbin Ehn <robbin.ehn at oracle.com>; Doerr, Martin
>>>>>>>>>>>> <martin.doerr at sap.com>; hotspot-runtime-
>>>> dev at openjdk.java.net
>>>>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
>>>> without
>>>>>>>>>>>> VMThread intervention
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/2/20 1:16 PM, Patricio Chilano wrote:
>>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for looking at this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4/2/20 10:15 AM, Robbin Ehn wrote:
>>>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2020-04-02 14:55, Doerr, Martin wrote:
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> first of all, thanks, Patricio, for doing this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One question:
>>>>>>>>>>>>>>> _pending_threads is used for synchronizing threads.
>>>>>>>>>>>>>>> Don't we need release / acquire semantics?
>>>>>>>>>>>>>>> I guess otherwise handshaker can read stale data from
>>>>>>> handshakee
>>>>>>>>>>>>>>> after observing is_completed.
>>>>>>>>>>>>>>> Or did I miss anything?
>>>>>>>>>>>>>> Reading stale should be fine in this case, since the 
>>>>>>>>>>>>>> counter goes
>>>>>>>>>>>>>> from n->0, thus you can only get false negatives from "bool
>>>>>>>>>>>>>> is_completed()" ?
>>>>>>>>>>>>> Right, as Robbin points out at most we will just have to loop
>>>>>>>>>>>>> again.
>>>>>>>>>>>>> Also, when the handshakee processes the handshake, right
>>>> after
>>>>>>>>>>>>> updating _pending_threads, it will either signal the
>>>>>>>>>>>>> _processing_sem
>>>>>>>>>>>>> if there are no direct operations, or it will signal the
>>>>>>>>>>>>> _handshake_turn_sem, so that will provide the release 
>>>>>>>>>>>>> already.
>>>> The
>>>>>>>>>>>>> next read in the loop should read the updated value. Are you
>>>> okay
>>>>>>> with
>>>>>>>>>>>>> that?
>>>>>>>>>>>> I realize you might be thinking of a different case now. You
>>>>>>>>>>>> mean the
>>>>>>>>>>>> handshaker could execute loads from code after
>>>>>>>>>>>> Handshake::execute_direct() while it hasn't yet read
>>>>>>> _pending_threads
>>>>>>>>>>>> from is_completed()? Although there is a load_acquire() in
>>>>>>>>>>>> ~ThreadsListHandle() that will prevent those issues you might
>>>> argue
>>>>>>> that
>>>>>>>>>>>> we can't rely on code which might change.
>>>>>>>>>>>> I think (not counting that load_acquire from 
>>>>>>>>>>>> ~ThreadsListHandle)
>>>>>>>>>>>> the
>>>>>>>>>>>> issue could appear for example if you created a 
>>>>>>>>>>>> ThreadsListHandle
>>>>>>> first,
>>>>>>>>>>>> then called Handshake::execute_direct(), and when returning 
>>>>>>>>>>>> you
>>>> try
>>>>>>> to
>>>>>>>>>>>> read some field from the handshakee (you might want to assert
>>>>>>>>> something
>>>>>>>>>>>> for example). If there wasn't any ThreadsListHandle created
>>>> before
>>>>>>>>>>>> calling execute_direct() I think you are not allowed to access
>>>>>>>>>>>> other
>>>>>>>>>>>> JavaThread's data anyway since the thread might have exited
>>>>>>>>>>>> already,
>>>>>>>>>>>> unless you create a ThreadsListHandle which will already 
>>>>>>>>>>>> provide
>>>>>>>>>>>> the
>>>>>>>>>>>> acquire. So I think you are right and we might need a fence 
>>>>>>>>>>>> before
>>>>>>>>>>>> returning from execute_direct() or add a comment that we are
>>>>>>>>>>>> relying
>>>>>>>>> on
>>>>>>>>>>>> the load_acquire from ~ThreadsListHandle(). Is that what you
>>>> were
>>>>>>>>>>>> thinking?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Patricio
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Patricio
>>>>>>>>>>>>>> Thanks, Robbin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>>>>>>>>>>>>>>> bounces at openjdk.java.net> On Behalf Of Robbin Ehn
>>>>>>>>>>>>>>>> Sent: Donnerstag, 2. April 2020 13:36
>>>>>>>>>>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>;
>>>>>>>>>>>>>>>> hotspot-runtime-
>>>>>>>>>>>>>>>> dev at openjdk.java.net
>>>>>>>>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
>>>>>>> without
>>>>>>>>>>>>>>>> VMThread intervention
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Patricio,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for fixing, looks good.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /Robbin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2020-04-01 19:45, Patricio Chilano wrote:
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please review this redo for 8230594. Since the reason
>>>> behind
>>>>>>> the
>>>>>>>>>>>>>>>>> Windows
>>>>>>>>>>>>>>>>> timeouts turned out to be 8240902, this patch is almost
>>>>>>>>>>>>>>>>> identical
>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>> reverse of the backout modulo the rebase.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There is only one small fix for an issue not 
>>>>>>>>>>>>>>>>> identified in the
>>>>>>>>>>>>>>>>> original
>>>>>>>>>>>>>>>>> patch which I explained in the comments. To avoid this
>>>>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>>>>> issue of
>>>>>>>>>>>>>>>>> deleting a JavaThread that called Thread::destroy_vm()
>>>> while
>>>>>>>>> there
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> handshakers blocked in the handshake_turn_sem
>>>> semaphore,
>>>>>>> I
>>>>>>>>>>>> added a
>>>>>>>>>>>>>>>> check
>>>>>>>>>>>>>>>>> to verify the JavaThread in question is not protected by
>>>> some
>>>>>>>>>>>>>>>>> ThreadsList reference before attempting the delete it. In
>>>> case
>>>>>>>>>>>>>>>>> it is
>>>>>>>>>>>>>>>>> protected, we need to at least clear the value for
>>>> _thread_key
>>>>>>> to
>>>>>>>>>>>>>>>>> avoid
>>>>>>>>>>>>>>>>> possible infinite loops at thread exit (this can 
>>>>>>>>>>>>>>>>> happen in
>>>>>>>>>>>>>>>>> Solaris).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The only other change is that I removed the check for 
>>>>>>>>>>>>>>>>> local
>>>>>>>>>>>>>>>>> polls in
>>>>>>>>>>>>>>>>> Handshake::execute_direct() since after the backout,
>>>> thread-
>>>>>>> local
>>>>>>>>>>>>>>>>> handshakes was implemented for arm32 which was the
>>>> only
>>>>>>>>>>>> remaining
>>>>>>>>>>>>>>>> platform.
>>>>>>>>>>>>>>>>> I tested it several times in mach5 tiers 1-6 and once 
>>>>>>>>>>>>>>>>> in t7
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> saw no
>>>>>>>>>>>>>>>>> failures.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Bug:
>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8240918
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v1/webrev/
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Patricio
>>
>