RFR 8240918: [REDO] Allow direct handshakes without VMThread intervention

Tue Apr 7 02:37:13 UTC 2020

Update looks good.

I'm not 100% sure that the low-level semaphore usage doesn't provide the 
necessary barriers, but the very fact it is hard to determine that makes 
it a good idea to have the explicit barriers.

Thanks,
David

On 7/04/2020 12:57 am, Patricio Chilano wrote:
> Hi Martin,
> 
> On 4/6/20 7:15 AM, Doerr, Martin wrote:
>> Hi Patricio,
>>
>> I'm very sorry for the late review. I couldn't make it last week.
>>
>> Thanks a lot for taking care of the ordering semantics. I'm a big fan 
>> of having it explicit, precise, clear and well documented.
> Yes, I also think it's more clear now with the explicit orderings. : )
> 
>> VM_HandshakeOneThread and VM_HandshakeAllThreads work without the 
>> barrier because HandshakeState::try_process with 
>> _processing_sem.signal() is always called?
>> I think it would be better to have the explicit acquire barrier there, 
>> too. Do you agree?
> I've been thinking about the VM_HandshakeOneThread and 
> VM_HandshakeAllThreads cases since the other day and although I don't 
> think there are issues in the use cases we have today it's probably a 
> good idea to add the fence there also in case that changes. So for the 
> VMThread it is safe without adding the barrier because upon returning 
> from the do_it() method it will just notify the Handshaker that the 
> operation has been processed and it will proceed to the main loop of 
> VMThread::loop() to process the next operation. I think there could be 
> an issue if we have one of this non-direct handshakes inside an 
> operation that allows nested operations inside it. So when the VMThread 
> returns from the handshake, it will go back to the first operation where 
> it could try to read some data changed during the handshake. If the 
> handshake was actually processed by the handshakee then we could have 
> this issue of reading stale data.
> 
>> check_state() is pointless in your new version.
> Removed.
> 
>> I've also looked at your test. Shouldn't we set -XX:+UseBiasedLocking 
>> explicitly so the test still works as it should after BiasedLocking 
>> deprecation?
> Yes, thanks for noticing it. Since I'm planning to push this first I 
> will change it now so I don't have to fix it later.
> 
> 
> Here is v4 which includes also the memory ordering changes I sent before:
> 
> Full:
> http://cr.openjdk.java.net/~pchilanomate/8240918/v4/webrev/
> Inc:
> http://cr.openjdk.java.net/~pchilanomate/8240918/v4/inc/webrev/
> 
> 
> Thanks!
> 
> Patricio
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: David Holmes <david.holmes at oracle.com>
>>> Sent: Montag, 6. April 2020 00:40
>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr, Martin
>>> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>; hotspot-
>>> runtime-dev at openjdk.java.net
>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>> VMThread intervention
>>>
>>> On 4/04/2020 12:57 am, Patricio Chilano wrote:
>>>> On 4/3/20 11:40 AM, Doerr, Martin wrote:
>>>>> Hi Patricio,
>>>>>
>>>>> thanks for the quick reply.
>>>>>
>>>>> So should we also use precise barriers in
>>>>> add_target_count(int count) { Atomic::add(&_pending_threads, count,
>>>>> memory_order_relaxed); }
>>>>> ?
>>>>> Relaxed would fit to your explanation.
>>>> Exactly. I was about to send you the diff with those two changes before
>>>> answering David and Dan about the acquire fence.   : )
>>>>
>>>>
>>>> diff --git a/src/hotspot/share/runtime/handshake.cpp
>>>> b/src/hotspot/share/runtime/handshake.cpp
>>>> --- a/src/hotspot/share/runtime/handshake.cpp
>>>> +++ b/src/hotspot/share/runtime/handshake.cpp
>>>> @@ -56,7 +56,7 @@
>>>>        assert(val >= 0, "_pending_threads=%d cannot be negative", val);
>>>>        return val == 0;
>>>>      }
>>>> -  void add_target_count(int count) { Atomic::add(&_pending_threads,
>>>> count); }
>>>> +  void add_target_count(int count) { Atomic::add(&_pending_threads,
>>>> count, memory_order_relaxed); }
>>>>      bool executed() const { return _executed; }
>>>>      const char* name() { return _handshake_cl->name(); }
>>>>
>>>> @@ -229,9 +229,8 @@
>>>>      // When this is executed by the Handshakee we need a release store
>>>>      // here to make sure memory operations executed in the handshake
>>>>      // closure are visible to the Handshaker after it reads that the
>>>> -  // operation has completed. Atomic::dec() already provides a full
>>>> -  // memory fence.
>>>> -  Atomic::dec(&_pending_threads);
>>>> +  // operation has completed.
>>>> +  Atomic::dec(&_pending_threads, memory_order_release);
>>>>
>>>>      // It is no longer safe to refer to 'this' as the
>>>> VMThread/Handshaker may have destroyed this operation
>>>>    }
>>> The above seems okay. We don't generally try to optimise the barriers
>>> associated with these atomic rmw ops due to the complexity of the
>>> reasoning around the different types of barriers. But in this case the
>>> reasoning seems quite straight-forward. The increment does not indicate
>>> anything to any other thread, it simply sets the right value in the
>>> counter (which may already have been decremented). The decrement
>>> indicates a completion token and so needs at least a release-barrier.
>>> The final check that watches for the count to be zero, needs the
>>> matching acquire barrier.
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Thanks Martin!
>>>>
>>>> Patricio
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>>>> Sent: Freitag, 3. April 2020 16:33
>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>>>> <david.holmes at oracle.com>; Robbin Ehn <robbin.ehn at oracle.com>;
>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>>> VMThread intervention
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> On 4/3/20 11:20 AM, Doerr, Martin wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> the acquire belongs to the load of _pending_threads in 
>>>>>>> is_completed()
>>>>>> (handshake.cpp:270).
>>>>>>> It's moved below the loop because we don't need to repeat the
>>> barrier.
>>>>>>> The release barrier is integrated in Atomic::dec(&_pending_threads)
>>>>>> (handshake.cpp:234).
>>>>>>> Should we better use Atomic::dec(&_pending_threads,
>>>>>> memory_order_release) to make it precise and explicit?
>>>>>> As you stated in your other email too we can specify explicit memory
>>>>>> ordering to Atomic::dec() which I forgot. I agree with explicitly
>>>>>> changing it to Atomic::dec(&_pending_threads,
>>> memory_order_release).
>>>>>>> Release / acquire semantics are exactly what I was missing in the
>>>>>>> earlier
>>>>>> version.
>>>>>>> I don't understand why "add_target_count" is called after the
>>>>>>> handshakes
>>>>>> were activated (handshake.cpp:171).
>>>>>>> Couldn't it happen that the first handshake completes before that
>>>>>>> (it would
>>>>>> set _pending_threads to 0)?
>>>>>> Yes, _pending_threads can even go negative for the
>>>>>> VM_HandshakeAllThreads case. But we only check _pending_threads in
>>> the
>>>>>> VMThread so for the Handshakees it doesn't matter when they
>>> decrement
>>>>>> the counter what's the actual value.
>>>>>>
>>>>>> Thanks,
>>>>>> Patricio
>>>>>>> Best regards,
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>>> Sent: Freitag, 3. April 2020 15:22
>>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr,
>>>>>>>> Martin
>>>>>>>> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>;
>>>>>> hotspot-
>>>>>>>> runtime-dev at openjdk.java.net
>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>>>>> VMThread intervention
>>>>>>>>
>>>>>>>> Hi Patricio,
>>>>>>>>
>>>>>>>> On 3/04/2020 6:00 pm, Patricio Chilano wrote:
>>>>>>>>> Hi Martin,
>>>>>>>>>
>>>>>>>>> On 4/2/20 5:24 PM, Doerr, Martin wrote:
>>>>>>>>>> Hi Patricio,
>>>>>>>>>>
>>>>>>>>>> right, you got my point. Sorry for not explaining it in greater
>>>>>>>>>> detail
>>>>>>>>>> at the beginning. I was busy with other things.
>>>>>>>>>>
>>>>>>>>>> There should be a memory barrier ordering read of
>>> _pending_threads
>>>>>>>>>> (which is observed as 0) and whatever gets done when returning
>>> from
>>>>>>>>>> execute_direct.
>>>>>>>>>> I don't like relying on barriers which are hidden somewhere and
>>>>>>>>>> may be
>>>>>>>>>> subject to change.
>>>>>>>>> Yes, I agree with that. I added an explicit acquire fence at the
>>>>>>>>> end of
>>>>>>>>> Handshake::execute_direct() with a comment. Thanks for noticing 
>>>>>>>>> it.
>>>>>>>> This is unclear to me:
>>>>>>>>
>>>>>>>> 281   // Prevent future loads from floating above the load of
>>>>>>>> _pending_threads
>>>>>>>>      282   // in is_completed(). This prevents reading stale data
>>>>>>>> modified
>>>>>>>> in the
>>>>>>>>      283   // handshake closure by the Handshakee.
>>>>>>>>      284   OrderAccess::acquire();
>>>>>>>>      285
>>>>>>>>      286   return op.executed();
>>>>>>>>
>>>>>>>> An acquire() by itself has no meaning, it has to be associated 
>>>>>>>> with a
>>>>>>>> release() and both have to be associated with the load, 
>>>>>>>> respectively,
>>>>>>>> store, of the same variable. From the comment it sounds like you 
>>>>>>>> just
>>>>>>>> want a loadload() barrier here.
>>>>>>>>
>>>>>>>> Other updates seem fine.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>>> I'd also prefer explicit barriers on the writers side over the
>>>>>>>>>> implicit ones from Atomic::add, because explicit barriers with
>>>>>>>>>> comments help a lot understanding the code.
>>>>>>>>>> (Not sure if you want to change that. Just my opinion.)
>>>>>>>>> I added a comment on top of Atomic::dec() on why we need a
>>> memory
>>>>>>>>> barrier there and the fact that we are already using a full memory
>>>>>>>>> fence.
>>>>>>>>> If you still think I should add an explicit barrier for code
>>>>>>>>> readability
>>>>>>>>> I can add it though.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is v3 which also addresses Dan's comments.
>>>>>>>>>
>>>>>>>>> Full:
>>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/webrev/
>>>>>>>>> Inc:
>>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/inc/webrev/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Patricio
>>>>>>>>>> Best regards,
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>>>>>>>>> Sent: Donnerstag, 2. April 2020 19:56
>>>>>>>>>>> To: Robbin Ehn <robbin.ehn at oracle.com>; Doerr, Martin
>>>>>>>>>>> <martin.doerr at sap.com>; hotspot-runtime-
>>> dev at openjdk.java.net
>>>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
>>> without
>>>>>>>>>>> VMThread intervention
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 4/2/20 1:16 PM, Patricio Chilano wrote:
>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for looking at this.
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/2/20 10:15 AM, Robbin Ehn wrote:
>>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2020-04-02 14:55, Doerr, Martin wrote:
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> first of all, thanks, Patricio, for doing this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One question:
>>>>>>>>>>>>>> _pending_threads is used for synchronizing threads.
>>>>>>>>>>>>>> Don't we need release / acquire semantics?
>>>>>>>>>>>>>> I guess otherwise handshaker can read stale data from
>>>>>> handshakee
>>>>>>>>>>>>>> after observing is_completed.
>>>>>>>>>>>>>> Or did I miss anything?
>>>>>>>>>>>>> Reading stale should be fine in this case, since the 
>>>>>>>>>>>>> counter goes
>>>>>>>>>>>>> from n->0, thus you can only get false negatives from "bool
>>>>>>>>>>>>> is_completed()" ?
>>>>>>>>>>>> Right, as Robbin points out at most we will just have to loop
>>>>>>>>>>>> again.
>>>>>>>>>>>> Also, when the handshakee processes the handshake, right
>>> after
>>>>>>>>>>>> updating _pending_threads, it will either signal the
>>>>>>>>>>>> _processing_sem
>>>>>>>>>>>> if there are no direct operations, or it will signal the
>>>>>>>>>>>> _handshake_turn_sem, so that will provide the release already.
>>> The
>>>>>>>>>>>> next read in the loop should read the updated value. Are you
>>> okay
>>>>>> with
>>>>>>>>>>>> that?
>>>>>>>>>>> I realize you might be thinking of a different case now. You
>>>>>>>>>>> mean the
>>>>>>>>>>> handshaker could execute loads from code after
>>>>>>>>>>> Handshake::execute_direct() while it hasn't yet read
>>>>>> _pending_threads
>>>>>>>>>>> from is_completed()? Although there is a load_acquire() in
>>>>>>>>>>> ~ThreadsListHandle() that will prevent those issues you might
>>> argue
>>>>>> that
>>>>>>>>>>> we can't rely on code which might change.
>>>>>>>>>>> I think (not counting that load_acquire from ~ThreadsListHandle)
>>>>>>>>>>> the
>>>>>>>>>>> issue could appear for example if you created a 
>>>>>>>>>>> ThreadsListHandle
>>>>>> first,
>>>>>>>>>>> then called Handshake::execute_direct(), and when returning you
>>> try
>>>>>> to
>>>>>>>>>>> read some field from the handshakee (you might want to assert
>>>>>>>> something
>>>>>>>>>>> for example). If there wasn't any ThreadsListHandle created
>>> before
>>>>>>>>>>> calling execute_direct() I think you are not allowed to access
>>>>>>>>>>> other
>>>>>>>>>>> JavaThread's data anyway since the thread might have exited
>>>>>>>>>>> already,
>>>>>>>>>>> unless you create a ThreadsListHandle which will already provide
>>>>>>>>>>> the
>>>>>>>>>>> acquire. So I think you are right and we might need a fence 
>>>>>>>>>>> before
>>>>>>>>>>> returning from execute_direct() or add a comment that we are
>>>>>>>>>>> relying
>>>>>>>> on
>>>>>>>>>>> the load_acquire from ~ThreadsListHandle(). Is that what you
>>> were
>>>>>>>>>>> thinking?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Patricio
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>
>>>>>>>>>>>> Patricio
>>>>>>>>>>>>> Thanks, Robbin
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>>>>>>>>>>>>>> bounces at openjdk.java.net> On Behalf Of Robbin Ehn
>>>>>>>>>>>>>>> Sent: Donnerstag, 2. April 2020 13:36
>>>>>>>>>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>;
>>>>>>>>>>>>>>> hotspot-runtime-
>>>>>>>>>>>>>>> dev at openjdk.java.net
>>>>>>>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
>>>>>> without
>>>>>>>>>>>>>>> VMThread intervention
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Patricio,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for fixing, looks good.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /Robbin
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2020-04-01 19:45, Patricio Chilano wrote:
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please review this redo for 8230594. Since the reason
>>> behind
>>>>>> the
>>>>>>>>>>>>>>>> Windows
>>>>>>>>>>>>>>>> timeouts turned out to be 8240902, this patch is almost
>>>>>>>>>>>>>>>> identical
>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>> reverse of the backout modulo the rebase.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There is only one small fix for an issue not identified 
>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>> original
>>>>>>>>>>>>>>>> patch which I explained in the comments. To avoid this
>>>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>>>> issue of
>>>>>>>>>>>>>>>> deleting a JavaThread that called Thread::destroy_vm()
>>> while
>>>>>>>> there
>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> handshakers blocked in the handshake_turn_sem
>>> semaphore,
>>>>>> I
>>>>>>>>>>> added a
>>>>>>>>>>>>>>> check
>>>>>>>>>>>>>>>> to verify the JavaThread in question is not protected by
>>> some
>>>>>>>>>>>>>>>> ThreadsList reference before attempting the delete it. In
>>> case
>>>>>>>>>>>>>>>> it is
>>>>>>>>>>>>>>>> protected, we need to at least clear the value for
>>> _thread_key
>>>>>> to
>>>>>>>>>>>>>>>> avoid
>>>>>>>>>>>>>>>> possible infinite loops at thread exit (this can happen in
>>>>>>>>>>>>>>>> Solaris).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The only other change is that I removed the check for local
>>>>>>>>>>>>>>>> polls in
>>>>>>>>>>>>>>>> Handshake::execute_direct() since after the backout,
>>> thread-
>>>>>> local
>>>>>>>>>>>>>>>> handshakes was implemented for arm32 which was the
>>> only
>>>>>>>>>>> remaining
>>>>>>>>>>>>>>> platform.
>>>>>>>>>>>>>>>> I tested it several times in mach5 tiers 1-6 and once in t7
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> saw no
>>>>>>>>>>>>>>>> failures.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bug:
>>>>>>>>>>>>>>>>          https://bugs.openjdk.java.net/browse/JDK-8240918
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>>>
>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v1/webrev/
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Patricio
>