RFR(m): 8214271: Fast primitive to wake many threads

Tue Jan 8 10:42:53 UTC 2019

Hi David,

On 1/2/19 12:35 AM, David Holmes wrote:
>>> Further this sounds like a race that could lead to bugs if not used very 
>>> carefully ie. you can't assume between disarm() and wake() that all threads 
>>> are blocked.
>>
>> I didn't realize how subtle this is. I think your original comment that
>> disarm/wake should be one operation was spot on.
>> Investigating... thinking... testing... yes I think this will work, fixed!
>> Sorry for not looking more into this before.
> 
> I'm now curious how this will actually work in the context of the safepoint 
> changes?

Since code already handle this 'invariant' with threads not being block between 
disarm() and wake(), doing it one operation just very slightly increases the 
chance that a thread will be blocked when we actually can handle it to be 
running, but reduces the chance to hit a false positive TLH poll.
(with TLH we have a two-step un-synchronizing out of safepoints where we must 
change global safepoint state before changing the thread polling state)

(I have some thoughts on simplifying TLH/safepoint states)

> Nit: I would have kept disarm() rather than wake() as I like the arm/disarm 
> duality.

Yes, me too. Not sure why I did the opposite, fixed!

> 
>    void GenericWaitBarrier::wait(int barrier_tag) {
>      assert(barrier_tag != 0, "Trying to wait on disarmed value");
> +   if (barrier_tag == 0 && barrier_tag != _barrier_tag) {
> +     OrderAccess::fence();
> +     return;
> +   }
> 
> I don't understand what the above is doing. A barrier_tag of 0 is a programming 
> error caught during testing in debug builds. You don't need to account for it 
> being 0 in product because this isn't something that can come in from an 
> external source - we have full code control here. And even if you want to be 
> this paranoid why would you need the fence?

Fixed, but kept the fence, since we say we are providing a trailing fence.
Otherwise I would like to add that exception to the description of wait().

Including Dan's comments:
Full: http://cr.openjdk.java.net/~rehn/8214271/6/full/webrev/
Inc : http://cr.openjdk.java.net/~rehn/8214271/6/inc/webrev/

Thanks, Robbin

> 
> Thanks,
> David
> -----
> 
>> Full:
>> http://cr.openjdk.java.net/~rehn/8214271/5/full/webrev/
>>
>> gtest passes thousands of loops locally and hundreds in mach5.
>>
>> Thanks, Robbin
>>
>>>
>>> Thanks,
>>> David
>>>
>>>>>
>>>>> s/Implementation/Implementations/
>>>>
>>>> Fixed
>>>>
>>>>>
>>>>> The fourth line is no longer needed.
>>>>
>>>> Above is the reason I would like to keep the fourth line, since only if you 
>>>> call
>>>> both disarm() and wake() you have that guarantee that waiter threads will
>>>> return.
>>>>
>>>> Thanks, Robbin
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>>> Inc:
>>>>>> http://cr.openjdk.java.net/~rehn/8214271/4/inc/webrev/
>>>>>>
>>>>>> Full:
>>>>>> http://cr.openjdk.java.net/~rehn/8214271/4/full/webrev/
>>>>>>
>>>>>> /Robbin
>>>>>>
>>>>>>>
>>>>>>> Otherwise this all looks good!
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>
>>>>>>>> Full:
>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/3/full/webrev/
>>>>>>>>
>>>>>>>> Thanks, Robbin
>>>>>>>>
>>>>>>>> On 11/23/18 5:55 PM, Robbin Ehn wrote:
>>>>>>>>> Forgot RFR in subject.
>>>>>>>>>
>>>>>>>>> /Robbin
>>>>>>>>>
>>>>>>>>> On 2018-11-23 17:51, Robbin Ehn wrote:
>>>>>>>>>> Hi all, please review.
>>>>>>>>>>
>>>>>>>>>> When a safepoint is ended we need a way to get back to 100% 
>>>>>>>>>> utilization as fast
>>>>>>>>>> as possible. 100% utilization means no idle cpu in the system if there 
>>>>>>>>>> is a
>>>>>>>>>> JavaThread that could be executed. The traditional ways to wake many, 
>>>>>>>>>> e.g.
>>>>>>>>>> semaphore, pthread_cond, is not implemented with a single syscall 
>>>>>>>>>> instead they
>>>>>>>>>> typical do one syscall per thread to wake.
>>>>>>>>>>
>>>>>>>>>> This change-set contains that primitive, the WaitBarrier, and a gtest 
>>>>>>>>>> for it.
>>>>>>>>>> No actual users, which is in coming patches.
>>>>>>>>>>
>>>>>>>>>> The WaitBarrier solves by doing a cooperative semaphore posting, 
>>>>>>>>>> threads woken
>>>>>>>>>> will also post. On Linux we can instead directly use a futex and with one
>>>>>>>>>> syscall wake all. Depending on how many threads and cpus the 
>>>>>>>>>> performance vary,
>>>>>>>>>> but a good utilization of the machine, just on the edge of saturated, 
>>>>>>>>>> the time to reach 100% utilization is around 3 times faster with the 
>>>>>>>>>> WaitBarrier (where futex is faster than semaphore).
>>>>>>>>>>
>>>>>>>>>> Webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/webrev/
>>>>>>>>>>
>>>>>>>>>> CR:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8214271
>>>>>>>>>>
>>>>>>>>>> Passes 100 iterations of gtest on our platforms, both fastdebug and 
>>>>>>>>>> release.
>>>>>>>>>> And have been stable when used in safepoints (t1-8) (coming patches).
>>>>>>>>>>
>>>>>>>>>> Thanks, Robbin