RFR 8240918: [REDO] Allow direct handshakes without VMThread intervention

Patricio Chilano patricio.chilano.mateo at oracle.com
Mon Apr 6 15:14:26 UTC 2020


Hi David,

On 4/5/20 7:39 PM, David Holmes wrote:
> On 4/04/2020 12:57 am, Patricio Chilano wrote:
>>
>> On 4/3/20 11:40 AM, Doerr, Martin wrote:
>>> Hi Patricio,
>>>
>>> thanks for the quick reply.
>>>
>>> So should we also use precise barriers in
>>> add_target_count(int count) { Atomic::add(&_pending_threads, count, 
>>> memory_order_relaxed); }
>>> ?
>>> Relaxed would fit to your explanation.
>> Exactly. I was about to send you the diff with those two changes 
>> before answering David and Dan about the acquire fence.   : )
>>
>>
>> diff --git a/src/hotspot/share/runtime/handshake.cpp 
>> b/src/hotspot/share/runtime/handshake.cpp
>> --- a/src/hotspot/share/runtime/handshake.cpp
>> +++ b/src/hotspot/share/runtime/handshake.cpp
>> @@ -56,7 +56,7 @@
>>       assert(val >= 0, "_pending_threads=%d cannot be negative", val);
>>       return val == 0;
>>     }
>> -  void add_target_count(int count) { Atomic::add(&_pending_threads, 
>> count); }
>> +  void add_target_count(int count) { Atomic::add(&_pending_threads, 
>> count, memory_order_relaxed); }
>>     bool executed() const { return _executed; }
>>     const char* name() { return _handshake_cl->name(); }
>>
>> @@ -229,9 +229,8 @@
>>     // When this is executed by the Handshakee we need a release store
>>     // here to make sure memory operations executed in the handshake
>>     // closure are visible to the Handshaker after it reads that the
>> -  // operation has completed. Atomic::dec() already provides a full
>> -  // memory fence.
>> -  Atomic::dec(&_pending_threads);
>> +  // operation has completed.
>> +  Atomic::dec(&_pending_threads, memory_order_release);
>>
>>     // It is no longer safe to refer to 'this' as the 
>> VMThread/Handshaker may have destroyed this operation
>>   }
>
> The above seems okay. We don't generally try to optimise the barriers 
> associated with these atomic rmw ops due to the complexity of the 
> reasoning around the different types of barriers. But in this case the 
> reasoning seems quite straight-forward. The increment does not 
> indicate anything to any other thread, it simply sets the right value 
> in the counter (which may already have been decremented). The 
> decrement indicates a completion token and so needs at least a 
> release-barrier. The final check that watches for the count to be 
> zero, needs the matching acquire barrier.
Right, that's exactly how the barriers work here. I sent v4 with 
additional changes.

Patricio
> Thanks,
> David
> -----
>
>> Thanks Martin!
>>
>> Patricio
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>> Sent: Freitag, 3. April 2020 16:33
>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>> <david.holmes at oracle.com>; Robbin Ehn <robbin.ehn at oracle.com>;
>>>> hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>> VMThread intervention
>>>>
>>>> Hi Martin,
>>>>
>>>> On 4/3/20 11:20 AM, Doerr, Martin wrote:
>>>>> Hi David,
>>>>>
>>>>> the acquire belongs to the load of _pending_threads in is_completed()
>>>> (handshake.cpp:270).
>>>>> It's moved below the loop because we don't need to repeat the 
>>>>> barrier.
>>>>>
>>>>> The release barrier is integrated in Atomic::dec(&_pending_threads)
>>>> (handshake.cpp:234).
>>>>> Should we better use Atomic::dec(&_pending_threads,
>>>> memory_order_release) to make it precise and explicit?
>>>> As you stated in your other email too we can specify explicit memory
>>>> ordering to Atomic::dec() which I forgot. I agree with explicitly
>>>> changing it to Atomic::dec(&_pending_threads, memory_order_release).
>>>>
>>>>> Release / acquire semantics are exactly what I was missing in the 
>>>>> earlier
>>>> version.
>>>>>
>>>>> I don't understand why "add_target_count" is called after the 
>>>>> handshakes
>>>> were activated (handshake.cpp:171).
>>>>> Couldn't it happen that the first handshake completes before that 
>>>>> (it would
>>>> set _pending_threads to 0)?
>>>> Yes, _pending_threads can even go negative for the
>>>> VM_HandshakeAllThreads case. But we only check _pending_threads in the
>>>> VMThread so for the Handshakees it doesn't matter when they decrement
>>>> the counter what's the actual value.
>>>>
>>>> Thanks,
>>>> Patricio
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>> Sent: Freitag, 3. April 2020 15:22
>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr, 
>>>>>> Martin
>>>>>> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>;
>>>> hotspot-
>>>>>> runtime-dev at openjdk.java.net
>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>>> VMThread intervention
>>>>>>
>>>>>> Hi Patricio,
>>>>>>
>>>>>> On 3/04/2020 6:00 pm, Patricio Chilano wrote:
>>>>>>> Hi Martin,
>>>>>>>
>>>>>>> On 4/2/20 5:24 PM, Doerr, Martin wrote:
>>>>>>>> Hi Patricio,
>>>>>>>>
>>>>>>>> right, you got my point. Sorry for not explaining it in greater 
>>>>>>>> detail
>>>>>>>> at the beginning. I was busy with other things.
>>>>>>>>
>>>>>>>> There should be a memory barrier ordering read of _pending_threads
>>>>>>>> (which is observed as 0) and whatever gets done when returning 
>>>>>>>> from
>>>>>>>> execute_direct.
>>>>>>>> I don't like relying on barriers which are hidden somewhere and 
>>>>>>>> may be
>>>>>>>> subject to change.
>>>>>>> Yes, I agree with that. I added an explicit acquire fence at the 
>>>>>>> end of
>>>>>>> Handshake::execute_direct() with a comment. Thanks for noticing it.
>>>>>> This is unclear to me:
>>>>>>
>>>>>> 281   // Prevent future loads from floating above the load of
>>>>>> _pending_threads
>>>>>>     282   // in is_completed(). This prevents reading stale data 
>>>>>> modified
>>>>>> in the
>>>>>>     283   // handshake closure by the Handshakee.
>>>>>>     284   OrderAccess::acquire();
>>>>>>     285
>>>>>>     286   return op.executed();
>>>>>>
>>>>>> An acquire() by itself has no meaning, it has to be associated 
>>>>>> with a
>>>>>> release() and both have to be associated with the load, 
>>>>>> respectively,
>>>>>> store, of the same variable. From the comment it sounds like you 
>>>>>> just
>>>>>> want a loadload() barrier here.
>>>>>>
>>>>>> Other updates seem fine.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>>> I'd also prefer explicit barriers on the writers side over the
>>>>>>>> implicit ones from Atomic::add, because explicit barriers with
>>>>>>>> comments help a lot understanding the code.
>>>>>>>> (Not sure if you want to change that. Just my opinion.)
>>>>>>> I added a comment on top of Atomic::dec() on why we need a memory
>>>>>>> barrier there and the fact that we are already using a full 
>>>>>>> memory fence.
>>>>>>> If you still think I should add an explicit barrier for code 
>>>>>>> readability
>>>>>>> I can add it though.
>>>>>>>
>>>>>>>
>>>>>>> Here is v3 which also addresses Dan's comments.
>>>>>>>
>>>>>>> Full:
>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/webrev/
>>>>>>> Inc:
>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/inc/webrev/
>>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Patricio
>>>>>>>> Best regards,
>>>>>>>> Martin
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>>>>>>> Sent: Donnerstag, 2. April 2020 19:56
>>>>>>>>> To: Robbin Ehn <robbin.ehn at oracle.com>; Doerr, Martin
>>>>>>>>> <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>>>>>> VMThread intervention
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 4/2/20 1:16 PM, Patricio Chilano wrote:
>>>>>>>>>> Hi Martin,
>>>>>>>>>>
>>>>>>>>>> Thanks for looking at this.
>>>>>>>>>>
>>>>>>>>>> On 4/2/20 10:15 AM, Robbin Ehn wrote:
>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>
>>>>>>>>>>> On 2020-04-02 14:55, Doerr, Martin wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> first of all, thanks, Patricio, for doing this.
>>>>>>>>>>>>
>>>>>>>>>>>> One question:
>>>>>>>>>>>> _pending_threads is used for synchronizing threads.
>>>>>>>>>>>> Don't we need release / acquire semantics?
>>>>>>>>>>>> I guess otherwise handshaker can read stale data from
>>>> handshakee
>>>>>>>>>>>> after observing is_completed.
>>>>>>>>>>>> Or did I miss anything?
>>>>>>>>>>> Reading stale should be fine in this case, since the counter 
>>>>>>>>>>> goes
>>>>>>>>>>> from n->0, thus you can only get false negatives from "bool
>>>>>>>>>>> is_completed()" ?
>>>>>>>>>> Right, as Robbin points out at most we will just have to loop 
>>>>>>>>>> again.
>>>>>>>>>> Also, when the handshakee processes the handshake, right after
>>>>>>>>>> updating _pending_threads, it will either signal the 
>>>>>>>>>> _processing_sem
>>>>>>>>>> if there are no direct operations, or it will signal the
>>>>>>>>>> _handshake_turn_sem, so that will provide the release 
>>>>>>>>>> already. The
>>>>>>>>>> next read in the loop should read the updated value. Are you 
>>>>>>>>>> okay
>>>> with
>>>>>>>>>> that?
>>>>>>>>> I realize you might be thinking of a different case now. You 
>>>>>>>>> mean the
>>>>>>>>> handshaker could execute loads from code after
>>>>>>>>> Handshake::execute_direct() while it hasn't yet read
>>>> _pending_threads
>>>>>>>>> from is_completed()? Although there is a load_acquire() in
>>>>>>>>> ~ThreadsListHandle() that will prevent those issues you might 
>>>>>>>>> argue
>>>> that
>>>>>>>>> we can't rely on code which might change.
>>>>>>>>> I think (not counting that load_acquire from 
>>>>>>>>> ~ThreadsListHandle) the
>>>>>>>>> issue could appear for example if you created a ThreadsListHandle
>>>> first,
>>>>>>>>> then called Handshake::execute_direct(), and when returning 
>>>>>>>>> you try
>>>> to
>>>>>>>>> read some field from the handshakee (you might want to assert
>>>>>> something
>>>>>>>>> for example). If there wasn't any ThreadsListHandle created 
>>>>>>>>> before
>>>>>>>>> calling execute_direct() I think you are not allowed to access 
>>>>>>>>> other
>>>>>>>>> JavaThread's data anyway since the thread might have exited 
>>>>>>>>> already,
>>>>>>>>> unless you create a ThreadsListHandle which will already 
>>>>>>>>> provide the
>>>>>>>>> acquire. So I think you are right and we might need a fence 
>>>>>>>>> before
>>>>>>>>> returning from execute_direct() or add a comment that we are 
>>>>>>>>> relying
>>>>>> on
>>>>>>>>> the load_acquire from ~ThreadsListHandle(). Is that what you were
>>>>>>>>> thinking?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Patricio
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Patricio
>>>>>>>>>>> Thanks, Robbin
>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>>>>>>>>>>>> bounces at openjdk.java.net> On Behalf Of Robbin Ehn
>>>>>>>>>>>>> Sent: Donnerstag, 2. April 2020 13:36
>>>>>>>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>;
>>>>>>>>>>>>> hotspot-runtime-
>>>>>>>>>>>>> dev at openjdk.java.net
>>>>>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
>>>> without
>>>>>>>>>>>>> VMThread intervention
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Patricio,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for fixing, looks good.
>>>>>>>>>>>>>
>>>>>>>>>>>>> /Robbin
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2020-04-01 19:45, Patricio Chilano wrote:
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please review this redo for 8230594. Since the reason behind
>>>> the
>>>>>>>>>>>>>> Windows
>>>>>>>>>>>>>> timeouts turned out to be 8240902, this patch is almost 
>>>>>>>>>>>>>> identical
>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>> reverse of the backout modulo the rebase.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is only one small fix for an issue not identified 
>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>> original
>>>>>>>>>>>>>> patch which I explained in the comments. To avoid this 
>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>> issue of
>>>>>>>>>>>>>> deleting a JavaThread that called Thread::destroy_vm() while
>>>>>> there
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>> handshakers blocked in the handshake_turn_sem semaphore,
>>>> I
>>>>>>>>> added a
>>>>>>>>>>>>> check
>>>>>>>>>>>>>> to verify the JavaThread in question is not protected by 
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>> ThreadsList reference before attempting the delete it. In 
>>>>>>>>>>>>>> case
>>>>>>>>>>>>>> it is
>>>>>>>>>>>>>> protected, we need to at least clear the value for 
>>>>>>>>>>>>>> _thread_key
>>>> to
>>>>>>>>>>>>>> avoid
>>>>>>>>>>>>>> possible infinite loops at thread exit (this can happen in
>>>>>>>>>>>>>> Solaris).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The only other change is that I removed the check for local
>>>>>>>>>>>>>> polls in
>>>>>>>>>>>>>> Handshake::execute_direct() since after the backout, thread-
>>>> local
>>>>>>>>>>>>>> handshakes was implemented for arm32 which was the only
>>>>>>>>> remaining
>>>>>>>>>>>>> platform.
>>>>>>>>>>>>>> I tested it several times in mach5 tiers 1-6 and once in 
>>>>>>>>>>>>>> t7 and
>>>>>>>>>>>>>> saw no
>>>>>>>>>>>>>> failures.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Bug:
>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8240918
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>
>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v1/webrev/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Patricio
>>



More information about the hotspot-runtime-dev mailing list