RFR(m): 8214271: Fast primitive to wake many threads

Robbin Ehn robbin.ehn at oracle.com
Wed Jan 9 12:57:13 UTC 2019


Hi David,

On 1/9/19 4:40 AM, David Holmes wrote:
> Hi Robbin,
> 
> No further significant comments, lets just see how this plays out.

Yes, thanks!

Fixed nits.

If Dan have something more non-trivial I'll publish a v7.

/Robbin

> 
> Some minor nits:
> 
> src/hotspot/share/utilities/waitBarrier.hpp
> 
> ! // A primary goal of the WaitBarrier implementation is to disarm all waiting
> 
> s/disarm/wake/
> 
> That was one place a global replace shouldn't have been applied. :)

Fixed!

> 
> ! //    - Calling disarm() guarantees any thread calling or called wait(tag) will
> 
> "or called" is not grammatically correct. Perhaps:
> 
> //    - Calling disarm() guarantees any thread now calling or that has called 
> wait(tag) will
> 

Fixed!

> 
>      // Guarantees any thread called wait() will be awake when it returns.
> 
> s/called/that called/
> 

Fixed!

> ---
> 
> src/hotspot/share/utilities/waitBarrier_generic.cpp
> 
> !   // disarm store must not float below.
> 
> s/float/sink/
> 

Fixed!

> 74   // API specifies wake() must provides a trailing fence.
> 
> s/wake/disarm/
> s/provides/provide/

Fixed

> 
>   81     // API specifies wait() must provides a trailing fence.
> 
> s/provides/provide/

Fixed

> 
> Thanks,
> David
> 
> On 8/01/2019 8:42 pm, Robbin Ehn wrote:
>> Hi David,
>>
>> On 1/2/19 12:35 AM, David Holmes wrote:
>>>>> Further this sounds like a race that could lead to bugs if not used very 
>>>>> carefully ie. you can't assume between disarm() and wake() that all threads 
>>>>> are blocked.
>>>>
>>>> I didn't realize how subtle this is. I think your original comment that
>>>> disarm/wake should be one operation was spot on.
>>>> Investigating... thinking... testing... yes I think this will work, fixed!
>>>> Sorry for not looking more into this before.
>>>
>>> I'm now curious how this will actually work in the context of the safepoint 
>>> changes?
>>
>> Since code already handle this 'invariant' with threads not being block 
>> between disarm() and wake(), doing it one operation just very slightly 
>> increases the chance that a thread will be blocked when we actually can handle 
>> it to be running, but reduces the chance to hit a false positive TLH poll.
>> (with TLH we have a two-step un-synchronizing out of safepoints where we must 
>> change global safepoint state before changing the thread polling state)
>>
>> (I have some thoughts on simplifying TLH/safepoint states)
>>
>>> Nit: I would have kept disarm() rather than wake() as I like the arm/disarm 
>>> duality.
>>
>> Yes, me too. Not sure why I did the opposite, fixed!
>>
>>>
>>>    void GenericWaitBarrier::wait(int barrier_tag) {
>>>      assert(barrier_tag != 0, "Trying to wait on disarmed value");
>>> +   if (barrier_tag == 0 && barrier_tag != _barrier_tag) {
>>> +     OrderAccess::fence();
>>> +     return;
>>> +   }
>>>
>>> I don't understand what the above is doing. A barrier_tag of 0 is a 
>>> programming error caught during testing in debug builds. You don't need to 
>>> account for it being 0 in product because this isn't something that can come 
>>> in from an external source - we have full code control here. And even if you 
>>> want to be this paranoid why would you need the fence?
>>
>> Fixed, but kept the fence, since we say we are providing a trailing fence.
>> Otherwise I would like to add that exception to the description of wait().
>>
>> Including Dan's comments:
>> Full: http://cr.openjdk.java.net/~rehn/8214271/6/full/webrev/
>> Inc : http://cr.openjdk.java.net/~rehn/8214271/6/inc/webrev/
>>
>>
>> Thanks, Robbin
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Full:
>>>> http://cr.openjdk.java.net/~rehn/8214271/5/full/webrev/
>>>>
>>>> gtest passes thousands of loops locally and hundreds in mach5.
>>>>
>>>> Thanks, Robbin
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>>>
>>>>>>> s/Implementation/Implementations/
>>>>>>
>>>>>> Fixed
>>>>>>
>>>>>>>
>>>>>>> The fourth line is no longer needed.
>>>>>>
>>>>>> Above is the reason I would like to keep the fourth line, since only if 
>>>>>> you call
>>>>>> both disarm() and wake() you have that guarantee that waiter threads will
>>>>>> return.
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>>> Inc:
>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/4/inc/webrev/
>>>>>>>>
>>>>>>>> Full:
>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/4/full/webrev/
>>>>>>>>
>>>>>>>> /Robbin
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Otherwise this all looks good!
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Full:
>>>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/3/full/webrev/
>>>>>>>>>>
>>>>>>>>>> Thanks, Robbin
>>>>>>>>>>
>>>>>>>>>> On 11/23/18 5:55 PM, Robbin Ehn wrote:
>>>>>>>>>>> Forgot RFR in subject.
>>>>>>>>>>>
>>>>>>>>>>> /Robbin
>>>>>>>>>>>
>>>>>>>>>>> On 2018-11-23 17:51, Robbin Ehn wrote:
>>>>>>>>>>>> Hi all, please review.
>>>>>>>>>>>>
>>>>>>>>>>>> When a safepoint is ended we need a way to get back to 100% 
>>>>>>>>>>>> utilization as fast
>>>>>>>>>>>> as possible. 100% utilization means no idle cpu in the system if 
>>>>>>>>>>>> there is a
>>>>>>>>>>>> JavaThread that could be executed. The traditional ways to wake 
>>>>>>>>>>>> many, e.g.
>>>>>>>>>>>> semaphore, pthread_cond, is not implemented with a single syscall 
>>>>>>>>>>>> instead they
>>>>>>>>>>>> typical do one syscall per thread to wake.
>>>>>>>>>>>>
>>>>>>>>>>>> This change-set contains that primitive, the WaitBarrier, and a 
>>>>>>>>>>>> gtest for it.
>>>>>>>>>>>> No actual users, which is in coming patches.
>>>>>>>>>>>>
>>>>>>>>>>>> The WaitBarrier solves by doing a cooperative semaphore posting, 
>>>>>>>>>>>> threads woken
>>>>>>>>>>>> will also post. On Linux we can instead directly use a futex and 
>>>>>>>>>>>> with one
>>>>>>>>>>>> syscall wake all. Depending on how many threads and cpus the 
>>>>>>>>>>>> performance vary,
>>>>>>>>>>>> but a good utilization of the machine, just on the edge of 
>>>>>>>>>>>> saturated, the time to reach 100% utilization is around 3 times 
>>>>>>>>>>>> faster with the WaitBarrier (where futex is faster than semaphore).
>>>>>>>>>>>>
>>>>>>>>>>>> Webrev:
>>>>>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/webrev/
>>>>>>>>>>>>
>>>>>>>>>>>> CR:
>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8214271
>>>>>>>>>>>>
>>>>>>>>>>>> Passes 100 iterations of gtest on our platforms, both fastdebug and 
>>>>>>>>>>>> release.
>>>>>>>>>>>> And have been stable when used in safepoints (t1-8) (coming patches).
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, Robbin


More information about the hotspot-dev mailing list