RFR 8240918: [REDO] Allow direct handshakes without VMThread intervention

Fri Apr 3 14:22:36 UTC 2020

Hi Dan,

On 4/3/20 11:12 AM, Daniel D. Daugherty wrote:
>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/inc/webrev/ 
>
> src/hotspot/share/runtime/biasedLocking.cpp
> src/hotspot/share/runtime/handshake.cpp
> src/hotspot/share/runtime/handshake.hpp
> src/hotspot/share/runtime/mutexLocker.cpp
> src/hotspot/share/runtime/thread.cpp
> src/hotspot/share/runtime/thread.hpp
> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>     I'm good with the editorial changes to the above.
>
> src/hotspot/share/runtime/handshake.cpp
>     I'm having a bit of a problem with this pair of changes:
>
>         L228:   // Inform VMThread/Handshaker that we have completed 
> the operation.
>         L229:   // When this is executed by the Handshakee we need a 
> release store
>         L230:   // here to make sure memory operations executed in the 
> handshake
>         L231:   // closure are visible to the Handshaker after it 
> reads that the
>         L232:   // operation has completed. Atomic::dec() already 
> provides a full
>         L233:   // memory fence.
>         L234:   Atomic::dec(&_pending_threads);
>
>     and:
>
>         L281:   // Prevent future loads from floating above the load 
> of _pending_threads
>         L282:   // in is_completed(). This prevents reading stale data 
> modified in the
>         L283:   // handshake closure by the Handshakee.
>         L284:   OrderAccess::acquire();
>
>     Usually you pair a release_store() with a load_acquire(). While I 
> agree that
>     atomicity is required for the _pending_threads field, I don't 
> think that gives
>     you the "release_store" side of a {release_store, load_acquire} pair. 
The default memory ordering when using Atomic::dec() is 
memory_order_conservative which provides the release already. I might 
change it explicitly to memory_order_release as per Martin comments.

> Also,
>     as David already said, that "OrderAccess::acquire()" without an 
> associated
>     load() doesn't really mean anything.
We don't need to load _pending_threads with acquire each time. Doing a 
simple load will at most make us loop again in the handshake loop. When 
we read the value of _pending_threads is zero and we are about to exit 
execute_direct() that's when we need to prevent future loads from 
floating above.

Patricio
>     In an earlier version of my Async Monitor Deflation project, I paired
>     Atomic::dec() and Atomic::inc() calls with 
> OrderAccess::load_acquire()
>     calls and both Robbin and David H. convinced me that wasn't necessary
>     (or correct). I have since switch to OrderAccess::load() for fetching
>     my counters.
>
> So thumbs up on the editorial changes, but not the memory order stuff.
>
> Dan
>
> On 4/3/20 4:00 AM, Patricio Chilano wrote:
>> Hi Martin,
>>
>> On 4/2/20 5:24 PM, Doerr, Martin wrote:
>>> Hi Patricio,
>>>
>>> right, you got my point. Sorry for not explaining it in greater 
>>> detail at the beginning. I was busy with other things.
>>>
>>> There should be a memory barrier ordering read of _pending_threads 
>>> (which is observed as 0) and whatever gets done when returning from 
>>> execute_direct.
>>> I don't like relying on barriers which are hidden somewhere and may 
>>> be subject to change.
>> Yes, I agree with that. I added an explicit acquire fence at the end 
>> of Handshake::execute_direct() with a comment. Thanks for noticing it.
>>
>>> I'd also prefer explicit barriers on the writers side over the 
>>> implicit ones from Atomic::add, because explicit barriers with 
>>> comments help a lot understanding the code.
>>> (Not sure if you want to change that. Just my opinion.)
>> I added a comment on top of Atomic::dec() on why we need a memory 
>> barrier there and the fact that we are already using a full memory 
>> fence.
>> If you still think I should add an explicit barrier for code 
>> readability I can add it though.
>>
>>
>> Here is v3 which also addresses Dan's comments.
>>
>> Full:
>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/webrev/
>> Inc:
>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/inc/webrev/
>>
>>
>> Thanks!
>> Patricio
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>> Sent: Donnerstag, 2. April 2020 19:56
>>>> To: Robbin Ehn <robbin.ehn at oracle.com>; Doerr, Martin
>>>> <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>> VMThread intervention
>>>>
>>>>
>>>> On 4/2/20 1:16 PM, Patricio Chilano wrote:
>>>>> Hi Martin,
>>>>>
>>>>> Thanks for looking at this.
>>>>>
>>>>> On 4/2/20 10:15 AM, Robbin Ehn wrote:
>>>>>> Hi Martin,
>>>>>>
>>>>>> On 2020-04-02 14:55, Doerr, Martin wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> first of all, thanks, Patricio, for doing this.
>>>>>>>
>>>>>>> One question:
>>>>>>> _pending_threads is used for synchronizing threads.
>>>>>>> Don't we need release / acquire semantics?
>>>>>>> I guess otherwise handshaker can read stale data from handshakee
>>>>>>> after observing is_completed.
>>>>>>> Or did I miss anything?
>>>>>> Reading stale should be fine in this case, since the counter goes
>>>>>> from n->0, thus you can only get false negatives from "bool
>>>>>> is_completed()" ?
>>>>> Right, as Robbin points out at most we will just have to loop again.
>>>>> Also, when the handshakee processes the handshake, right after
>>>>> updating _pending_threads, it will either signal the _processing_sem
>>>>> if there are no direct operations, or it will signal the
>>>>> _handshake_turn_sem, so that will provide the release already. The
>>>>> next read in the loop should read the updated value. Are you okay 
>>>>> with
>>>>> that?
>>>> I realize you might be thinking of a different case now. You mean the
>>>> handshaker could execute loads from code after
>>>> Handshake::execute_direct() while it hasn't yet read _pending_threads
>>>> from is_completed()? Although there is a load_acquire() in
>>>> ~ThreadsListHandle() that will prevent those issues you might argue 
>>>> that
>>>> we can't rely on code which might change.
>>>> I think (not counting that load_acquire from ~ThreadsListHandle) the
>>>> issue could appear for example if you created a ThreadsListHandle 
>>>> first,
>>>> then called Handshake::execute_direct(), and when returning you try to
>>>> read some field from the handshakee (you might want to assert 
>>>> something
>>>> for example). If there wasn't any ThreadsListHandle created before
>>>> calling execute_direct() I think you are not allowed to access other
>>>> JavaThread's data anyway since the thread might have exited already,
>>>> unless you create a ThreadsListHandle which will already provide the
>>>> acquire. So I think you are right and we might need a fence before
>>>> returning from execute_direct() or add a comment that we are 
>>>> relying on
>>>> the load_acquire from ~ThreadsListHandle(). Is that what you were 
>>>> thinking?
>>>>
>>>> Thanks,
>>>> Patricio
>>>>> Thanks!
>>>>>
>>>>> Patricio
>>>>>> Thanks, Robbin
>>>>>>
>>>>>>> Best regards,
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>>>>>>> bounces at openjdk.java.net> On Behalf Of Robbin Ehn
>>>>>>>> Sent: Donnerstag, 2. April 2020 13:36
>>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>;
>>>>>>>> hotspot-runtime-
>>>>>>>> dev at openjdk.java.net
>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>>>>> VMThread intervention
>>>>>>>>
>>>>>>>> Hi Patricio,
>>>>>>>>
>>>>>>>> Thanks for fixing, looks good.
>>>>>>>>
>>>>>>>> /Robbin
>>>>>>>>
>>>>>>>> On 2020-04-01 19:45, Patricio Chilano wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Please review this redo for 8230594. Since the reason behind the
>>>>>>>>> Windows
>>>>>>>>> timeouts turned out to be 8240902, this patch is almost identical
>>>>>>>>> to the
>>>>>>>>> reverse of the backout modulo the rebase.
>>>>>>>>>
>>>>>>>>> There is only one small fix for an issue not identified in the
>>>>>>>>> original
>>>>>>>>> patch which I explained in the comments. To avoid this possible
>>>>>>>>> issue of
>>>>>>>>> deleting a JavaThread that called Thread::destroy_vm() while 
>>>>>>>>> there
>>>>>>>>> are
>>>>>>>>> handshakers blocked in the handshake_turn_sem semaphore, I
>>>> added a
>>>>>>>> check
>>>>>>>>> to verify the JavaThread in question is not protected by some
>>>>>>>>> ThreadsList reference before attempting the delete it. In case 
>>>>>>>>> it is
>>>>>>>>> protected, we need to at least clear the value for _thread_key to
>>>>>>>>> avoid
>>>>>>>>> possible infinite loops at thread exit (this can happen in 
>>>>>>>>> Solaris).
>>>>>>>>>
>>>>>>>>> The only other change is that I removed the check for local 
>>>>>>>>> polls in
>>>>>>>>> Handshake::execute_direct() since after the backout, thread-local
>>>>>>>>> handshakes was implemented for arm32 which was the only
>>>> remaining
>>>>>>>> platform.
>>>>>>>>> I tested it several times in mach5 tiers 1-6 and once in t7 and
>>>>>>>>> saw no
>>>>>>>>> failures.
>>>>>>>>>
>>>>>>>>> Bug:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8240918
>>>>>>>>>
>>>>>>>>> Webrev:
>>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v1/webrev/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Patricio
>>
>