RFR 8240918: [REDO] Allow direct handshakes without VMThread intervention

David Holmes david.holmes at oracle.com
Sun Apr 5 22:39:31 UTC 2020


On 4/04/2020 12:57 am, Patricio Chilano wrote:
> 
> On 4/3/20 11:40 AM, Doerr, Martin wrote:
>> Hi Patricio,
>>
>> thanks for the quick reply.
>>
>> So should we also use precise barriers in
>> add_target_count(int count) { Atomic::add(&_pending_threads, count, 
>> memory_order_relaxed); }
>> ?
>> Relaxed would fit to your explanation.
> Exactly. I was about to send you the diff with those two changes before 
> answering David and Dan about the acquire fence.   : )
> 
> 
> diff --git a/src/hotspot/share/runtime/handshake.cpp 
> b/src/hotspot/share/runtime/handshake.cpp
> --- a/src/hotspot/share/runtime/handshake.cpp
> +++ b/src/hotspot/share/runtime/handshake.cpp
> @@ -56,7 +56,7 @@
>       assert(val >= 0, "_pending_threads=%d cannot be negative", val);
>       return val == 0;
>     }
> -  void add_target_count(int count) { Atomic::add(&_pending_threads, 
> count); }
> +  void add_target_count(int count) { Atomic::add(&_pending_threads, 
> count, memory_order_relaxed); }
>     bool executed() const { return _executed; }
>     const char* name() { return _handshake_cl->name(); }
> 
> @@ -229,9 +229,8 @@
>     // When this is executed by the Handshakee we need a release store
>     // here to make sure memory operations executed in the handshake
>     // closure are visible to the Handshaker after it reads that the
> -  // operation has completed. Atomic::dec() already provides a full
> -  // memory fence.
> -  Atomic::dec(&_pending_threads);
> +  // operation has completed.
> +  Atomic::dec(&_pending_threads, memory_order_release);
> 
>     // It is no longer safe to refer to 'this' as the 
> VMThread/Handshaker may have destroyed this operation
>   }

The above seems okay. We don't generally try to optimise the barriers 
associated with these atomic rmw ops due to the complexity of the 
reasoning around the different types of barriers. But in this case the 
reasoning seems quite straight-forward. The increment does not indicate 
anything to any other thread, it simply sets the right value in the 
counter (which may already have been decremented). The decrement 
indicates a completion token and so needs at least a release-barrier. 
The final check that watches for the count to be zero, needs the 
matching acquire barrier.

Thanks,
David
-----

> Thanks Martin!
> 
> Patricio
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>> Sent: Freitag, 3. April 2020 16:33
>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>> <david.holmes at oracle.com>; Robbin Ehn <robbin.ehn at oracle.com>;
>>> hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>> VMThread intervention
>>>
>>> Hi Martin,
>>>
>>> On 4/3/20 11:20 AM, Doerr, Martin wrote:
>>>> Hi David,
>>>>
>>>> the acquire belongs to the load of _pending_threads in is_completed()
>>> (handshake.cpp:270).
>>>> It's moved below the loop because we don't need to repeat the barrier.
>>>>
>>>> The release barrier is integrated in Atomic::dec(&_pending_threads)
>>> (handshake.cpp:234).
>>>> Should we better use Atomic::dec(&_pending_threads,
>>> memory_order_release) to make it precise and explicit?
>>> As you stated in your other email too we can specify explicit memory
>>> ordering to Atomic::dec() which I forgot. I agree with explicitly
>>> changing it to Atomic::dec(&_pending_threads, memory_order_release).
>>>
>>>> Release / acquire semantics are exactly what I was missing in the 
>>>> earlier
>>> version.
>>>>
>>>> I don't understand why "add_target_count" is called after the 
>>>> handshakes
>>> were activated (handshake.cpp:171).
>>>> Couldn't it happen that the first handshake completes before that 
>>>> (it would
>>> set _pending_threads to 0)?
>>> Yes, _pending_threads can even go negative for the
>>> VM_HandshakeAllThreads case. But we only check _pending_threads in the
>>> VMThread so for the Handshakees it doesn't matter when they decrement
>>> the counter what's the actual value.
>>>
>>> Thanks,
>>> Patricio
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>> Sent: Freitag, 3. April 2020 15:22
>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr, 
>>>>> Martin
>>>>> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>;
>>> hotspot-
>>>>> runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>> VMThread intervention
>>>>>
>>>>> Hi Patricio,
>>>>>
>>>>> On 3/04/2020 6:00 pm, Patricio Chilano wrote:
>>>>>> Hi Martin,
>>>>>>
>>>>>> On 4/2/20 5:24 PM, Doerr, Martin wrote:
>>>>>>> Hi Patricio,
>>>>>>>
>>>>>>> right, you got my point. Sorry for not explaining it in greater 
>>>>>>> detail
>>>>>>> at the beginning. I was busy with other things.
>>>>>>>
>>>>>>> There should be a memory barrier ordering read of _pending_threads
>>>>>>> (which is observed as 0) and whatever gets done when returning from
>>>>>>> execute_direct.
>>>>>>> I don't like relying on barriers which are hidden somewhere and 
>>>>>>> may be
>>>>>>> subject to change.
>>>>>> Yes, I agree with that. I added an explicit acquire fence at the 
>>>>>> end of
>>>>>> Handshake::execute_direct() with a comment. Thanks for noticing it.
>>>>> This is unclear to me:
>>>>>
>>>>> 281   // Prevent future loads from floating above the load of
>>>>> _pending_threads
>>>>>     282   // in is_completed(). This prevents reading stale data 
>>>>> modified
>>>>> in the
>>>>>     283   // handshake closure by the Handshakee.
>>>>>     284   OrderAccess::acquire();
>>>>>     285
>>>>>     286   return op.executed();
>>>>>
>>>>> An acquire() by itself has no meaning, it has to be associated with a
>>>>> release() and both have to be associated with the load, respectively,
>>>>> store, of the same variable. From the comment it sounds like you just
>>>>> want a loadload() barrier here.
>>>>>
>>>>> Other updates seem fine.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>>> I'd also prefer explicit barriers on the writers side over the
>>>>>>> implicit ones from Atomic::add, because explicit barriers with
>>>>>>> comments help a lot understanding the code.
>>>>>>> (Not sure if you want to change that. Just my opinion.)
>>>>>> I added a comment on top of Atomic::dec() on why we need a memory
>>>>>> barrier there and the fact that we are already using a full memory 
>>>>>> fence.
>>>>>> If you still think I should add an explicit barrier for code 
>>>>>> readability
>>>>>> I can add it though.
>>>>>>
>>>>>>
>>>>>> Here is v3 which also addresses Dan's comments.
>>>>>>
>>>>>> Full:
>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/webrev/
>>>>>> Inc:
>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/inc/webrev/
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>> Patricio
>>>>>>> Best regards,
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>>>>>> Sent: Donnerstag, 2. April 2020 19:56
>>>>>>>> To: Robbin Ehn <robbin.ehn at oracle.com>; Doerr, Martin
>>>>>>>> <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>>>>> VMThread intervention
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/2/20 1:16 PM, Patricio Chilano wrote:
>>>>>>>>> Hi Martin,
>>>>>>>>>
>>>>>>>>> Thanks for looking at this.
>>>>>>>>>
>>>>>>>>> On 4/2/20 10:15 AM, Robbin Ehn wrote:
>>>>>>>>>> Hi Martin,
>>>>>>>>>>
>>>>>>>>>> On 2020-04-02 14:55, Doerr, Martin wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> first of all, thanks, Patricio, for doing this.
>>>>>>>>>>>
>>>>>>>>>>> One question:
>>>>>>>>>>> _pending_threads is used for synchronizing threads.
>>>>>>>>>>> Don't we need release / acquire semantics?
>>>>>>>>>>> I guess otherwise handshaker can read stale data from
>>> handshakee
>>>>>>>>>>> after observing is_completed.
>>>>>>>>>>> Or did I miss anything?
>>>>>>>>>> Reading stale should be fine in this case, since the counter goes
>>>>>>>>>> from n->0, thus you can only get false negatives from "bool
>>>>>>>>>> is_completed()" ?
>>>>>>>>> Right, as Robbin points out at most we will just have to loop 
>>>>>>>>> again.
>>>>>>>>> Also, when the handshakee processes the handshake, right after
>>>>>>>>> updating _pending_threads, it will either signal the 
>>>>>>>>> _processing_sem
>>>>>>>>> if there are no direct operations, or it will signal the
>>>>>>>>> _handshake_turn_sem, so that will provide the release already. The
>>>>>>>>> next read in the loop should read the updated value. Are you okay
>>> with
>>>>>>>>> that?
>>>>>>>> I realize you might be thinking of a different case now. You 
>>>>>>>> mean the
>>>>>>>> handshaker could execute loads from code after
>>>>>>>> Handshake::execute_direct() while it hasn't yet read
>>> _pending_threads
>>>>>>>> from is_completed()? Although there is a load_acquire() in
>>>>>>>> ~ThreadsListHandle() that will prevent those issues you might argue
>>> that
>>>>>>>> we can't rely on code which might change.
>>>>>>>> I think (not counting that load_acquire from ~ThreadsListHandle) 
>>>>>>>> the
>>>>>>>> issue could appear for example if you created a ThreadsListHandle
>>> first,
>>>>>>>> then called Handshake::execute_direct(), and when returning you try
>>> to
>>>>>>>> read some field from the handshakee (you might want to assert
>>>>> something
>>>>>>>> for example). If there wasn't any ThreadsListHandle created before
>>>>>>>> calling execute_direct() I think you are not allowed to access 
>>>>>>>> other
>>>>>>>> JavaThread's data anyway since the thread might have exited 
>>>>>>>> already,
>>>>>>>> unless you create a ThreadsListHandle which will already provide 
>>>>>>>> the
>>>>>>>> acquire. So I think you are right and we might need a fence before
>>>>>>>> returning from execute_direct() or add a comment that we are 
>>>>>>>> relying
>>>>> on
>>>>>>>> the load_acquire from ~ThreadsListHandle(). Is that what you were
>>>>>>>> thinking?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Patricio
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Patricio
>>>>>>>>>> Thanks, Robbin
>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>>>>>>>>>>> bounces at openjdk.java.net> On Behalf Of Robbin Ehn
>>>>>>>>>>>> Sent: Donnerstag, 2. April 2020 13:36
>>>>>>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>;
>>>>>>>>>>>> hotspot-runtime-
>>>>>>>>>>>> dev at openjdk.java.net
>>>>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
>>> without
>>>>>>>>>>>> VMThread intervention
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Patricio,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for fixing, looks good.
>>>>>>>>>>>>
>>>>>>>>>>>> /Robbin
>>>>>>>>>>>>
>>>>>>>>>>>> On 2020-04-01 19:45, Patricio Chilano wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please review this redo for 8230594. Since the reason behind
>>> the
>>>>>>>>>>>>> Windows
>>>>>>>>>>>>> timeouts turned out to be 8240902, this patch is almost 
>>>>>>>>>>>>> identical
>>>>>>>>>>>>> to the
>>>>>>>>>>>>> reverse of the backout modulo the rebase.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is only one small fix for an issue not identified in the
>>>>>>>>>>>>> original
>>>>>>>>>>>>> patch which I explained in the comments. To avoid this 
>>>>>>>>>>>>> possible
>>>>>>>>>>>>> issue of
>>>>>>>>>>>>> deleting a JavaThread that called Thread::destroy_vm() while
>>>>> there
>>>>>>>>>>>>> are
>>>>>>>>>>>>> handshakers blocked in the handshake_turn_sem semaphore,
>>> I
>>>>>>>> added a
>>>>>>>>>>>> check
>>>>>>>>>>>>> to verify the JavaThread in question is not protected by some
>>>>>>>>>>>>> ThreadsList reference before attempting the delete it. In case
>>>>>>>>>>>>> it is
>>>>>>>>>>>>> protected, we need to at least clear the value for _thread_key
>>> to
>>>>>>>>>>>>> avoid
>>>>>>>>>>>>> possible infinite loops at thread exit (this can happen in
>>>>>>>>>>>>> Solaris).
>>>>>>>>>>>>>
>>>>>>>>>>>>> The only other change is that I removed the check for local
>>>>>>>>>>>>> polls in
>>>>>>>>>>>>> Handshake::execute_direct() since after the backout, thread-
>>> local
>>>>>>>>>>>>> handshakes was implemented for arm32 which was the only
>>>>>>>> remaining
>>>>>>>>>>>> platform.
>>>>>>>>>>>>> I tested it several times in mach5 tiers 1-6 and once in t7 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> saw no
>>>>>>>>>>>>> failures.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bug:
>>>>>>>>>>>>>         https://bugs.openjdk.java.net/browse/JDK-8240918
>>>>>>>>>>>>>
>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>
>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v1/webrev/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Patricio
> 


More information about the hotspot-runtime-dev mailing list