RFR 8240918: [REDO] Allow direct handshakes without VMThread intervention

Patricio Chilano patricio.chilano.mateo at oracle.com
Tue Apr 7 05:44:32 UTC 2020


Hi David,

On 4/6/20 11:37 PM, David Holmes wrote:
> Update looks good.
Great, thanks for reviewing it!

> I'm not 100% sure that the low-level semaphore usage doesn't provide 
> the necessary barriers, but the very fact it is hard to determine that 
> makes it a good idea to have the explicit barriers.
So, for the direct-handshake case if we don't add the barrier it could 
happen for example that the load of _executed in "return op.executed()" 
floats above the load of _pending_threads in op.is_completed(). So we 
first read _executed is false, then the handshakee executes the release 
store in do_handshake() after executing the closure and setting 
_executed to true, then we read _pending_threads is zero, and so we 
break out of the while loop but return that the handshake was not 
executed. It could also happen for loads from the handshaker after the 
execute_direct() call. For example, after returning from 
execute_direct(), the handshaker could try to load some field from the 
handshakee that was supposed to be modified during the execution of the 
handshake closure.

For the non-direct cases, I think we are probably okay without that 
barrier for all the current usages of handshakes, since the VMThread 
upon returning from the handshake will just wake up the handshaker and 
proceed to the next operation. Also, the _executed field is not even 
read by the VMThread so we don't have that problem either. But we might 
have the same issue for that nested handshake case I described. In the 
current baseline, we have the static _done semaphore instead of the 
_pending_threads counter, which already provides the acquire semantics 
when breaking out of the while loop. But again I don't think we have 
such problematic nested cases anyways.

Also, just want to add that the ~ThreadsListHandle() already has an 
acquire load which will save us from all the above cases without adding 
the explicit barrier (except maybe the issue with the return value for 
the direct handshake case). But I think as we discussed, it is probably 
better to have an explicit barrier, in case that code changes and also 
just for readability purposes.


Patricio
> Thanks,
> David
>
> On 7/04/2020 12:57 am, Patricio Chilano wrote:
>> Hi Martin,
>>
>> On 4/6/20 7:15 AM, Doerr, Martin wrote:
>>> Hi Patricio,
>>>
>>> I'm very sorry for the late review. I couldn't make it last week.
>>>
>>> Thanks a lot for taking care of the ordering semantics. I'm a big 
>>> fan of having it explicit, precise, clear and well documented.
>> Yes, I also think it's more clear now with the explicit orderings. : )
>>
>>> VM_HandshakeOneThread and VM_HandshakeAllThreads work without the 
>>> barrier because HandshakeState::try_process with 
>>> _processing_sem.signal() is always called?
>>> I think it would be better to have the explicit acquire barrier 
>>> there, too. Do you agree?
>> I've been thinking about the VM_HandshakeOneThread and 
>> VM_HandshakeAllThreads cases since the other day and although I don't 
>> think there are issues in the use cases we have today it's probably a 
>> good idea to add the fence there also in case that changes. So for 
>> the VMThread it is safe without adding the barrier because upon 
>> returning from the do_it() method it will just notify the Handshaker 
>> that the operation has been processed and it will proceed to the main 
>> loop of VMThread::loop() to process the next operation. I think there 
>> could be an issue if we have one of this non-direct handshakes inside 
>> an operation that allows nested operations inside it. So when the 
>> VMThread returns from the handshake, it will go back to the first 
>> operation where it could try to read some data changed during the 
>> handshake. If the handshake was actually processed by the handshakee 
>> then we could have this issue of reading stale data.
>>
>>> check_state() is pointless in your new version.
>> Removed.
>>
>>> I've also looked at your test. Shouldn't we set 
>>> -XX:+UseBiasedLocking explicitly so the test still works as it 
>>> should after BiasedLocking deprecation?
>> Yes, thanks for noticing it. Since I'm planning to push this first I 
>> will change it now so I don't have to fix it later.
>>
>>
>> Here is v4 which includes also the memory ordering changes I sent 
>> before:
>>
>> Full:
>> http://cr.openjdk.java.net/~pchilanomate/8240918/v4/webrev/
>> Inc:
>> http://cr.openjdk.java.net/~pchilanomate/8240918/v4/inc/webrev/
>>
>>
>> Thanks!
>>
>> Patricio
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Montag, 6. April 2020 00:40
>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr, 
>>>> Martin
>>>> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>; hotspot-
>>>> runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>> VMThread intervention
>>>>
>>>> On 4/04/2020 12:57 am, Patricio Chilano wrote:
>>>>> On 4/3/20 11:40 AM, Doerr, Martin wrote:
>>>>>> Hi Patricio,
>>>>>>
>>>>>> thanks for the quick reply.
>>>>>>
>>>>>> So should we also use precise barriers in
>>>>>> add_target_count(int count) { Atomic::add(&_pending_threads, count,
>>>>>> memory_order_relaxed); }
>>>>>> ?
>>>>>> Relaxed would fit to your explanation.
>>>>> Exactly. I was about to send you the diff with those two changes 
>>>>> before
>>>>> answering David and Dan about the acquire fence.   : )
>>>>>
>>>>>
>>>>> diff --git a/src/hotspot/share/runtime/handshake.cpp
>>>>> b/src/hotspot/share/runtime/handshake.cpp
>>>>> --- a/src/hotspot/share/runtime/handshake.cpp
>>>>> +++ b/src/hotspot/share/runtime/handshake.cpp
>>>>> @@ -56,7 +56,7 @@
>>>>>        assert(val >= 0, "_pending_threads=%d cannot be negative", 
>>>>> val);
>>>>>        return val == 0;
>>>>>      }
>>>>> -  void add_target_count(int count) { Atomic::add(&_pending_threads,
>>>>> count); }
>>>>> +  void add_target_count(int count) { Atomic::add(&_pending_threads,
>>>>> count, memory_order_relaxed); }
>>>>>      bool executed() const { return _executed; }
>>>>>      const char* name() { return _handshake_cl->name(); }
>>>>>
>>>>> @@ -229,9 +229,8 @@
>>>>>      // When this is executed by the Handshakee we need a release 
>>>>> store
>>>>>      // here to make sure memory operations executed in the handshake
>>>>>      // closure are visible to the Handshaker after it reads that the
>>>>> -  // operation has completed. Atomic::dec() already provides a full
>>>>> -  // memory fence.
>>>>> -  Atomic::dec(&_pending_threads);
>>>>> +  // operation has completed.
>>>>> +  Atomic::dec(&_pending_threads, memory_order_release);
>>>>>
>>>>>      // It is no longer safe to refer to 'this' as the
>>>>> VMThread/Handshaker may have destroyed this operation
>>>>>    }
>>>> The above seems okay. We don't generally try to optimise the barriers
>>>> associated with these atomic rmw ops due to the complexity of the
>>>> reasoning around the different types of barriers. But in this case the
>>>> reasoning seems quite straight-forward. The increment does not 
>>>> indicate
>>>> anything to any other thread, it simply sets the right value in the
>>>> counter (which may already have been decremented). The decrement
>>>> indicates a completion token and so needs at least a release-barrier.
>>>> The final check that watches for the count to be zero, needs the
>>>> matching acquire barrier.
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> Thanks Martin!
>>>>>
>>>>> Patricio
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>>>>> Sent: Freitag, 3. April 2020 16:33
>>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>>>>> <david.holmes at oracle.com>; Robbin Ehn <robbin.ehn at oracle.com>;
>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>>>> VMThread intervention
>>>>>>>
>>>>>>> Hi Martin,
>>>>>>>
>>>>>>> On 4/3/20 11:20 AM, Doerr, Martin wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> the acquire belongs to the load of _pending_threads in 
>>>>>>>> is_completed()
>>>>>>> (handshake.cpp:270).
>>>>>>>> It's moved below the loop because we don't need to repeat the
>>>> barrier.
>>>>>>>> The release barrier is integrated in 
>>>>>>>> Atomic::dec(&_pending_threads)
>>>>>>> (handshake.cpp:234).
>>>>>>>> Should we better use Atomic::dec(&_pending_threads,
>>>>>>> memory_order_release) to make it precise and explicit?
>>>>>>> As you stated in your other email too we can specify explicit 
>>>>>>> memory
>>>>>>> ordering to Atomic::dec() which I forgot. I agree with explicitly
>>>>>>> changing it to Atomic::dec(&_pending_threads,
>>>> memory_order_release).
>>>>>>>> Release / acquire semantics are exactly what I was missing in the
>>>>>>>> earlier
>>>>>>> version.
>>>>>>>> I don't understand why "add_target_count" is called after the
>>>>>>>> handshakes
>>>>>>> were activated (handshake.cpp:171).
>>>>>>>> Couldn't it happen that the first handshake completes before that
>>>>>>>> (it would
>>>>>>> set _pending_threads to 0)?
>>>>>>> Yes, _pending_threads can even go negative for the
>>>>>>> VM_HandshakeAllThreads case. But we only check _pending_threads in
>>>> the
>>>>>>> VMThread so for the Handshakees it doesn't matter when they
>>>> decrement
>>>>>>> the counter what's the actual value.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Patricio
>>>>>>>> Best regards,
>>>>>>>> Martin
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>>>> Sent: Freitag, 3. April 2020 15:22
>>>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr,
>>>>>>>>> Martin
>>>>>>>>> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>;
>>>>>>> hotspot-
>>>>>>>>> runtime-dev at openjdk.java.net
>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
>>>>>>>>> VMThread intervention
>>>>>>>>>
>>>>>>>>> Hi Patricio,
>>>>>>>>>
>>>>>>>>> On 3/04/2020 6:00 pm, Patricio Chilano wrote:
>>>>>>>>>> Hi Martin,
>>>>>>>>>>
>>>>>>>>>> On 4/2/20 5:24 PM, Doerr, Martin wrote:
>>>>>>>>>>> Hi Patricio,
>>>>>>>>>>>
>>>>>>>>>>> right, you got my point. Sorry for not explaining it in greater
>>>>>>>>>>> detail
>>>>>>>>>>> at the beginning. I was busy with other things.
>>>>>>>>>>>
>>>>>>>>>>> There should be a memory barrier ordering read of
>>>> _pending_threads
>>>>>>>>>>> (which is observed as 0) and whatever gets done when returning
>>>> from
>>>>>>>>>>> execute_direct.
>>>>>>>>>>> I don't like relying on barriers which are hidden somewhere and
>>>>>>>>>>> may be
>>>>>>>>>>> subject to change.
>>>>>>>>>> Yes, I agree with that. I added an explicit acquire fence at the
>>>>>>>>>> end of
>>>>>>>>>> Handshake::execute_direct() with a comment. Thanks for 
>>>>>>>>>> noticing it.
>>>>>>>>> This is unclear to me:
>>>>>>>>>
>>>>>>>>> 281   // Prevent future loads from floating above the load of
>>>>>>>>> _pending_threads
>>>>>>>>>      282   // in is_completed(). This prevents reading stale data
>>>>>>>>> modified
>>>>>>>>> in the
>>>>>>>>>      283   // handshake closure by the Handshakee.
>>>>>>>>>      284   OrderAccess::acquire();
>>>>>>>>>      285
>>>>>>>>>      286   return op.executed();
>>>>>>>>>
>>>>>>>>> An acquire() by itself has no meaning, it has to be associated 
>>>>>>>>> with a
>>>>>>>>> release() and both have to be associated with the load, 
>>>>>>>>> respectively,
>>>>>>>>> store, of the same variable. From the comment it sounds like 
>>>>>>>>> you just
>>>>>>>>> want a loadload() barrier here.
>>>>>>>>>
>>>>>>>>> Other updates seem fine.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>>> I'd also prefer explicit barriers on the writers side over the
>>>>>>>>>>> implicit ones from Atomic::add, because explicit barriers with
>>>>>>>>>>> comments help a lot understanding the code.
>>>>>>>>>>> (Not sure if you want to change that. Just my opinion.)
>>>>>>>>>> I added a comment on top of Atomic::dec() on why we need a
>>>> memory
>>>>>>>>>> barrier there and the fact that we are already using a full 
>>>>>>>>>> memory
>>>>>>>>>> fence.
>>>>>>>>>> If you still think I should add an explicit barrier for code
>>>>>>>>>> readability
>>>>>>>>>> I can add it though.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is v3 which also addresses Dan's comments.
>>>>>>>>>>
>>>>>>>>>> Full:
>>>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/webrev/
>>>>>>>>>> Inc:
>>>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/inc/webrev/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Patricio
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>>>>>>>>>> Sent: Donnerstag, 2. April 2020 19:56
>>>>>>>>>>>> To: Robbin Ehn <robbin.ehn at oracle.com>; Doerr, Martin
>>>>>>>>>>>> <martin.doerr at sap.com>; hotspot-runtime-
>>>> dev at openjdk.java.net
>>>>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
>>>> without
>>>>>>>>>>>> VMThread intervention
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/2/20 1:16 PM, Patricio Chilano wrote:
>>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for looking at this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4/2/20 10:15 AM, Robbin Ehn wrote:
>>>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2020-04-02 14:55, Doerr, Martin wrote:
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> first of all, thanks, Patricio, for doing this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One question:
>>>>>>>>>>>>>>> _pending_threads is used for synchronizing threads.
>>>>>>>>>>>>>>> Don't we need release / acquire semantics?
>>>>>>>>>>>>>>> I guess otherwise handshaker can read stale data from
>>>>>>> handshakee
>>>>>>>>>>>>>>> after observing is_completed.
>>>>>>>>>>>>>>> Or did I miss anything?
>>>>>>>>>>>>>> Reading stale should be fine in this case, since the 
>>>>>>>>>>>>>> counter goes
>>>>>>>>>>>>>> from n->0, thus you can only get false negatives from "bool
>>>>>>>>>>>>>> is_completed()" ?
>>>>>>>>>>>>> Right, as Robbin points out at most we will just have to loop
>>>>>>>>>>>>> again.
>>>>>>>>>>>>> Also, when the handshakee processes the handshake, right
>>>> after
>>>>>>>>>>>>> updating _pending_threads, it will either signal the
>>>>>>>>>>>>> _processing_sem
>>>>>>>>>>>>> if there are no direct operations, or it will signal the
>>>>>>>>>>>>> _handshake_turn_sem, so that will provide the release 
>>>>>>>>>>>>> already.
>>>> The
>>>>>>>>>>>>> next read in the loop should read the updated value. Are you
>>>> okay
>>>>>>> with
>>>>>>>>>>>>> that?
>>>>>>>>>>>> I realize you might be thinking of a different case now. You
>>>>>>>>>>>> mean the
>>>>>>>>>>>> handshaker could execute loads from code after
>>>>>>>>>>>> Handshake::execute_direct() while it hasn't yet read
>>>>>>> _pending_threads
>>>>>>>>>>>> from is_completed()? Although there is a load_acquire() in
>>>>>>>>>>>> ~ThreadsListHandle() that will prevent those issues you might
>>>> argue
>>>>>>> that
>>>>>>>>>>>> we can't rely on code which might change.
>>>>>>>>>>>> I think (not counting that load_acquire from 
>>>>>>>>>>>> ~ThreadsListHandle)
>>>>>>>>>>>> the
>>>>>>>>>>>> issue could appear for example if you created a 
>>>>>>>>>>>> ThreadsListHandle
>>>>>>> first,
>>>>>>>>>>>> then called Handshake::execute_direct(), and when returning 
>>>>>>>>>>>> you
>>>> try
>>>>>>> to
>>>>>>>>>>>> read some field from the handshakee (you might want to assert
>>>>>>>>> something
>>>>>>>>>>>> for example). If there wasn't any ThreadsListHandle created
>>>> before
>>>>>>>>>>>> calling execute_direct() I think you are not allowed to access
>>>>>>>>>>>> other
>>>>>>>>>>>> JavaThread's data anyway since the thread might have exited
>>>>>>>>>>>> already,
>>>>>>>>>>>> unless you create a ThreadsListHandle which will already 
>>>>>>>>>>>> provide
>>>>>>>>>>>> the
>>>>>>>>>>>> acquire. So I think you are right and we might need a fence 
>>>>>>>>>>>> before
>>>>>>>>>>>> returning from execute_direct() or add a comment that we are
>>>>>>>>>>>> relying
>>>>>>>>> on
>>>>>>>>>>>> the load_acquire from ~ThreadsListHandle(). Is that what you
>>>> were
>>>>>>>>>>>> thinking?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Patricio
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Patricio
>>>>>>>>>>>>>> Thanks, Robbin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>>>>>>>>>>>>>>> bounces at openjdk.java.net> On Behalf Of Robbin Ehn
>>>>>>>>>>>>>>>> Sent: Donnerstag, 2. April 2020 13:36
>>>>>>>>>>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>;
>>>>>>>>>>>>>>>> hotspot-runtime-
>>>>>>>>>>>>>>>> dev at openjdk.java.net
>>>>>>>>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
>>>>>>> without
>>>>>>>>>>>>>>>> VMThread intervention
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Patricio,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for fixing, looks good.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /Robbin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2020-04-01 19:45, Patricio Chilano wrote:
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please review this redo for 8230594. Since the reason
>>>> behind
>>>>>>> the
>>>>>>>>>>>>>>>>> Windows
>>>>>>>>>>>>>>>>> timeouts turned out to be 8240902, this patch is almost
>>>>>>>>>>>>>>>>> identical
>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>> reverse of the backout modulo the rebase.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There is only one small fix for an issue not 
>>>>>>>>>>>>>>>>> identified in the
>>>>>>>>>>>>>>>>> original
>>>>>>>>>>>>>>>>> patch which I explained in the comments. To avoid this
>>>>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>>>>> issue of
>>>>>>>>>>>>>>>>> deleting a JavaThread that called Thread::destroy_vm()
>>>> while
>>>>>>>>> there
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> handshakers blocked in the handshake_turn_sem
>>>> semaphore,
>>>>>>> I
>>>>>>>>>>>> added a
>>>>>>>>>>>>>>>> check
>>>>>>>>>>>>>>>>> to verify the JavaThread in question is not protected by
>>>> some
>>>>>>>>>>>>>>>>> ThreadsList reference before attempting the delete it. In
>>>> case
>>>>>>>>>>>>>>>>> it is
>>>>>>>>>>>>>>>>> protected, we need to at least clear the value for
>>>> _thread_key
>>>>>>> to
>>>>>>>>>>>>>>>>> avoid
>>>>>>>>>>>>>>>>> possible infinite loops at thread exit (this can 
>>>>>>>>>>>>>>>>> happen in
>>>>>>>>>>>>>>>>> Solaris).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The only other change is that I removed the check for 
>>>>>>>>>>>>>>>>> local
>>>>>>>>>>>>>>>>> polls in
>>>>>>>>>>>>>>>>> Handshake::execute_direct() since after the backout,
>>>> thread-
>>>>>>> local
>>>>>>>>>>>>>>>>> handshakes was implemented for arm32 which was the
>>>> only
>>>>>>>>>>>> remaining
>>>>>>>>>>>>>>>> platform.
>>>>>>>>>>>>>>>>> I tested it several times in mach5 tiers 1-6 and once 
>>>>>>>>>>>>>>>>> in t7
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> saw no
>>>>>>>>>>>>>>>>> failures.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Bug:
>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8240918
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v1/webrev/
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Patricio
>>



More information about the hotspot-runtime-dev mailing list