RFR 8240918: [REDO] Allow direct handshakes without VMThread intervention

Fri Apr 3 14:12:24 UTC 2020

> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/inc/webrev/ 

src/hotspot/share/runtime/biasedLocking.cpp
src/hotspot/share/runtime/handshake.cpp
src/hotspot/share/runtime/handshake.hpp
src/hotspot/share/runtime/mutexLocker.cpp
src/hotspot/share/runtime/thread.cpp
src/hotspot/share/runtime/thread.hpp
test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
     I'm good with the editorial changes to the above.

src/hotspot/share/runtime/handshake.cpp
     I'm having a bit of a problem with this pair of changes:

         L228:   // Inform VMThread/Handshaker that we have completed 
the operation.
         L229:   // When this is executed by the Handshakee we need a 
release store
         L230:   // here to make sure memory operations executed in the 
handshake
         L231:   // closure are visible to the Handshaker after it reads 
that the
         L232:   // operation has completed. Atomic::dec() already 
provides a full
         L233:   // memory fence.
         L234:   Atomic::dec(&_pending_threads);

     and:

         L281:   // Prevent future loads from floating above the load of 
_pending_threads
         L282:   // in is_completed(). This prevents reading stale data 
modified in the
         L283:   // handshake closure by the Handshakee.
         L284:   OrderAccess::acquire();

     Usually you pair a release_store() with a load_acquire(). While I 
agree that
     atomicity is required for the _pending_threads field, I don't think 
that gives
     you the "release_store" side of a {release_store, load_acquire} 
pair. Also,
     as David already said, that "OrderAccess::acquire()" without an 
associated
     load() doesn't really mean anything.

     In an earlier version of my Async Monitor Deflation project, I paired
     Atomic::dec() and Atomic::inc() calls with OrderAccess::load_acquire()
     calls and both Robbin and David H. convinced me that wasn't necessary
     (or correct). I have since switch to OrderAccess::load() for fetching
     my counters.

So thumbs up on the editorial changes, but not the memory order stuff.

Dan

On 4/3/20 4:00 AM, Patricio Chilano wrote:
> Hi Martin,
>
> On 4/2/20 5:24 PM, Doerr, Martin wrote:
>> Hi Patricio,
>>
>> right, you got my point. Sorry for not explaining it in greater 
>> detail at the beginning. I was busy with other things.
>>
>> There should be a memory barrier ordering read of _pending_threads 
>> (which is observed as 0) and whatever gets done when returning from 
>> execute_direct.
>> I don't like relying on barriers which are hidden somewhere and may 
>> be subject to change.
> Yes, I agree with that. I added an explicit acquire fence at the end 
> of Handshake::execute_direct() with a comment. Thanks for noticing it.
>
>> I'd also prefer explicit barriers on the writers side over the 
>> implicit ones from Atomic::add, because explicit barriers with 
>> comments help a lot understanding the code.
>> (Not sure if you want to change that. Just my opinion.)
> I added a comment on top of Atomic::dec() on why we need a memory 
> barrier there and the fact that we are already using a full memory fence.
> If you still think I should add an explicit barrier for code 
> readability I can add it though.
>
>
> Here is v3 which also addresses Dan's comments.
>
> Full:
> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/webrev/
> Inc:
> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/inc/webrev/
>
>
> Thanks!
> Patricio
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>> Sent: Donnerstag, 2. April 2020 19:56
>>> To: Robbin Ehn <robbin.ehn at oracle.com>; Doerr, Martin
>>> <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>> VMThread intervention
>>>
>>>
>>> On 4/2/20 1:16 PM, Patricio Chilano wrote:
>>>> Hi Martin,
>>>>
>>>> Thanks for looking at this.
>>>>
>>>> On 4/2/20 10:15 AM, Robbin Ehn wrote:
>>>>> Hi Martin,
>>>>>
>>>>> On 2020-04-02 14:55, Doerr, Martin wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> first of all, thanks, Patricio, for doing this.
>>>>>>
>>>>>> One question:
>>>>>> _pending_threads is used for synchronizing threads.
>>>>>> Don't we need release / acquire semantics?
>>>>>> I guess otherwise handshaker can read stale data from handshakee
>>>>>> after observing is_completed.
>>>>>> Or did I miss anything?
>>>>> Reading stale should be fine in this case, since the counter goes
>>>>> from n->0, thus you can only get false negatives from "bool
>>>>> is_completed()" ?
>>>> Right, as Robbin points out at most we will just have to loop again.
>>>> Also, when the handshakee processes the handshake, right after
>>>> updating _pending_threads, it will either signal the _processing_sem
>>>> if there are no direct operations, or it will signal the
>>>> _handshake_turn_sem, so that will provide the release already. The
>>>> next read in the loop should read the updated value. Are you okay with
>>>> that?
>>> I realize you might be thinking of a different case now. You mean the
>>> handshaker could execute loads from code after
>>> Handshake::execute_direct() while it hasn't yet read _pending_threads
>>> from is_completed()? Although there is a load_acquire() in
>>> ~ThreadsListHandle() that will prevent those issues you might argue 
>>> that
>>> we can't rely on code which might change.
>>> I think (not counting that load_acquire from ~ThreadsListHandle) the
>>> issue could appear for example if you created a ThreadsListHandle 
>>> first,
>>> then called Handshake::execute_direct(), and when returning you try to
>>> read some field from the handshakee (you might want to assert something
>>> for example). If there wasn't any ThreadsListHandle created before
>>> calling execute_direct() I think you are not allowed to access other
>>> JavaThread's data anyway since the thread might have exited already,
>>> unless you create a ThreadsListHandle which will already provide the
>>> acquire. So I think you are right and we might need a fence before
>>> returning from execute_direct() or add a comment that we are relying on
>>> the load_acquire from ~ThreadsListHandle(). Is that what you were 
>>> thinking?
>>>
>>> Thanks,
>>> Patricio
>>>> Thanks!
>>>>
>>>> Patricio
>>>>> Thanks, Robbin
>>>>>
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>>>>>> bounces at openjdk.java.net> On Behalf Of Robbin Ehn
>>>>>>> Sent: Donnerstag, 2. April 2020 13:36
>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>;
>>>>>>> hotspot-runtime-
>>>>>>> dev at openjdk.java.net
>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>>>> VMThread intervention
>>>>>>>
>>>>>>> Hi Patricio,
>>>>>>>
>>>>>>> Thanks for fixing, looks good.
>>>>>>>
>>>>>>> /Robbin
>>>>>>>
>>>>>>> On 2020-04-01 19:45, Patricio Chilano wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Please review this redo for 8230594. Since the reason behind the
>>>>>>>> Windows
>>>>>>>> timeouts turned out to be 8240902, this patch is almost identical
>>>>>>>> to the
>>>>>>>> reverse of the backout modulo the rebase.
>>>>>>>>
>>>>>>>> There is only one small fix for an issue not identified in the
>>>>>>>> original
>>>>>>>> patch which I explained in the comments. To avoid this possible
>>>>>>>> issue of
>>>>>>>> deleting a JavaThread that called Thread::destroy_vm() while there
>>>>>>>> are
>>>>>>>> handshakers blocked in the handshake_turn_sem semaphore, I
>>> added a
>>>>>>> check
>>>>>>>> to verify the JavaThread in question is not protected by some
>>>>>>>> ThreadsList reference before attempting the delete it. In case 
>>>>>>>> it is
>>>>>>>> protected, we need to at least clear the value for _thread_key to
>>>>>>>> avoid
>>>>>>>> possible infinite loops at thread exit (this can happen in 
>>>>>>>> Solaris).
>>>>>>>>
>>>>>>>> The only other change is that I removed the check for local 
>>>>>>>> polls in
>>>>>>>> Handshake::execute_direct() since after the backout, thread-local
>>>>>>>> handshakes was implemented for arm32 which was the only
>>> remaining
>>>>>>> platform.
>>>>>>>> I tested it several times in mach5 tiers 1-6 and once in t7 and
>>>>>>>> saw no
>>>>>>>> failures.
>>>>>>>>
>>>>>>>> Bug:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8240918
>>>>>>>>
>>>>>>>> Webrev:
>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v1/webrev/
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Patricio
>