RFR(m): 8214271: Fast primitive to wake many threads
David Holmes
david.holmes at oracle.com
Wed Jan 9 03:40:42 UTC 2019
Hi Robbin,
No further significant comments, lets just see how this plays out.
Some minor nits:
src/hotspot/share/utilities/waitBarrier.hpp
! // A primary goal of the WaitBarrier implementation is to disarm all
waiting
s/disarm/wake/
That was one place a global replace shouldn't have been applied. :)
! // - Calling disarm() guarantees any thread calling or called
wait(tag) will
"or called" is not grammatically correct. Perhaps:
// - Calling disarm() guarantees any thread now calling or that has
called wait(tag) will
// Guarantees any thread called wait() will be awake when it returns.
s/called/that called/
---
src/hotspot/share/utilities/waitBarrier_generic.cpp
! // disarm store must not float below.
s/float/sink/
74 // API specifies wake() must provides a trailing fence.
s/wake/disarm/
s/provides/provide/
81 // API specifies wait() must provides a trailing fence.
s/provides/provide/
Thanks,
David
On 8/01/2019 8:42 pm, Robbin Ehn wrote:
> Hi David,
>
> On 1/2/19 12:35 AM, David Holmes wrote:
>>>> Further this sounds like a race that could lead to bugs if not used
>>>> very carefully ie. you can't assume between disarm() and wake() that
>>>> all threads are blocked.
>>>
>>> I didn't realize how subtle this is. I think your original comment that
>>> disarm/wake should be one operation was spot on.
>>> Investigating... thinking... testing... yes I think this will work,
>>> fixed!
>>> Sorry for not looking more into this before.
>>
>> I'm now curious how this will actually work in the context of the
>> safepoint changes?
>
> Since code already handle this 'invariant' with threads not being block
> between disarm() and wake(), doing it one operation just very slightly
> increases the chance that a thread will be blocked when we actually can
> handle it to be running, but reduces the chance to hit a false positive
> TLH poll.
> (with TLH we have a two-step un-synchronizing out of safepoints where we
> must change global safepoint state before changing the thread polling
> state)
>
> (I have some thoughts on simplifying TLH/safepoint states)
>
>> Nit: I would have kept disarm() rather than wake() as I like the
>> arm/disarm duality.
>
> Yes, me too. Not sure why I did the opposite, fixed!
>
>>
>> void GenericWaitBarrier::wait(int barrier_tag) {
>> assert(barrier_tag != 0, "Trying to wait on disarmed value");
>> + if (barrier_tag == 0 && barrier_tag != _barrier_tag) {
>> + OrderAccess::fence();
>> + return;
>> + }
>>
>> I don't understand what the above is doing. A barrier_tag of 0 is a
>> programming error caught during testing in debug builds. You don't
>> need to account for it being 0 in product because this isn't something
>> that can come in from an external source - we have full code control
>> here. And even if you want to be this paranoid why would you need the
>> fence?
>
> Fixed, but kept the fence, since we say we are providing a trailing fence.
> Otherwise I would like to add that exception to the description of wait().
>
> Including Dan's comments:
> Full: http://cr.openjdk.java.net/~rehn/8214271/6/full/webrev/
> Inc : http://cr.openjdk.java.net/~rehn/8214271/6/inc/webrev/
>
>
> Thanks, Robbin
>
>>
>> Thanks,
>> David
>> -----
>>
>>> Full:
>>> http://cr.openjdk.java.net/~rehn/8214271/5/full/webrev/
>>>
>>> gtest passes thousands of loops locally and hundreds in mach5.
>>>
>>> Thanks, Robbin
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>>>
>>>>>> s/Implementation/Implementations/
>>>>>
>>>>> Fixed
>>>>>
>>>>>>
>>>>>> The fourth line is no longer needed.
>>>>>
>>>>> Above is the reason I would like to keep the fourth line, since
>>>>> only if you call
>>>>> both disarm() and wake() you have that guarantee that waiter
>>>>> threads will
>>>>> return.
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>>> Inc:
>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/4/inc/webrev/
>>>>>>>
>>>>>>> Full:
>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/4/full/webrev/
>>>>>>>
>>>>>>> /Robbin
>>>>>>>
>>>>>>>>
>>>>>>>> Otherwise this all looks good!
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>
>>>>>>>>> Full:
>>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/3/full/webrev/
>>>>>>>>>
>>>>>>>>> Thanks, Robbin
>>>>>>>>>
>>>>>>>>> On 11/23/18 5:55 PM, Robbin Ehn wrote:
>>>>>>>>>> Forgot RFR in subject.
>>>>>>>>>>
>>>>>>>>>> /Robbin
>>>>>>>>>>
>>>>>>>>>> On 2018-11-23 17:51, Robbin Ehn wrote:
>>>>>>>>>>> Hi all, please review.
>>>>>>>>>>>
>>>>>>>>>>> When a safepoint is ended we need a way to get back to 100%
>>>>>>>>>>> utilization as fast
>>>>>>>>>>> as possible. 100% utilization means no idle cpu in the system
>>>>>>>>>>> if there is a
>>>>>>>>>>> JavaThread that could be executed. The traditional ways to
>>>>>>>>>>> wake many, e.g.
>>>>>>>>>>> semaphore, pthread_cond, is not implemented with a single
>>>>>>>>>>> syscall instead they
>>>>>>>>>>> typical do one syscall per thread to wake.
>>>>>>>>>>>
>>>>>>>>>>> This change-set contains that primitive, the WaitBarrier, and
>>>>>>>>>>> a gtest for it.
>>>>>>>>>>> No actual users, which is in coming patches.
>>>>>>>>>>>
>>>>>>>>>>> The WaitBarrier solves by doing a cooperative semaphore
>>>>>>>>>>> posting, threads woken
>>>>>>>>>>> will also post. On Linux we can instead directly use a futex
>>>>>>>>>>> and with one
>>>>>>>>>>> syscall wake all. Depending on how many threads and cpus the
>>>>>>>>>>> performance vary,
>>>>>>>>>>> but a good utilization of the machine, just on the edge of
>>>>>>>>>>> saturated, the time to reach 100% utilization is around 3
>>>>>>>>>>> times faster with the WaitBarrier (where futex is faster than
>>>>>>>>>>> semaphore).
>>>>>>>>>>>
>>>>>>>>>>> Webrev:
>>>>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/webrev/
>>>>>>>>>>>
>>>>>>>>>>> CR:
>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8214271
>>>>>>>>>>>
>>>>>>>>>>> Passes 100 iterations of gtest on our platforms, both
>>>>>>>>>>> fastdebug and release.
>>>>>>>>>>> And have been stable when used in safepoints (t1-8) (coming
>>>>>>>>>>> patches).
>>>>>>>>>>>
>>>>>>>>>>> Thanks, Robbin
More information about the hotspot-dev
mailing list