RFR(m): 8214271: Fast primitive to wake many threads

Robbin Ehn robbin.ehn at oracle.com
Fri Dec 21 09:45:54 UTC 2018


Hi David,

On 2018-12-21 04:17, David Holmes wrote:
> 
> Got it - subtle.
> 
> Further this sounds like a race that could lead to bugs if not used very 
> carefully ie. you can't assume between disarm() and wake() that all threads are 
> blocked.

I didn't realize how subtle this is. I think your original comment that
disarm/wake should be one operation was spot on.
Investigating... thinking... testing... yes I think this will work, fixed!
Sorry for not looking more into this before.

> 
> I think perhaps this needs to be expanded to make this more obvious:
> 
>    68 //    - A call to wait(tag) will block if the barrier is armed with the value
>    69 //      'tag'; else it will return immediately.
>    70 //    - A blocked thread is eligible to execute again once the barrier is
>    71 //      disarmed and wake() has been called.
> +          - A call to wait(tag) that would block if it continued, but instead
> +            is descheduled, may return immediately if scheduled after a
> +           call to disarm(), but before the call to wake().
> 
> It also made me realize that in the general case (not when used with safepoints 
> I think due to other state checks) a wake() may stall due to threads with a 
> previous tag entering the wait() late.

I added a double checking in the semaphore version, this means both
implementation should have progress guarantee.

Making this v5 a bit large due to a lot of comments being changed.

Inc:
http://cr.openjdk.java.net/~rehn/8214271/5/inc/webrev/
Full:
http://cr.openjdk.java.net/~rehn/8214271/5/full/webrev/

gtest passes thousands of loops locally and hundreds in mach5.

Thanks, Robbin

> 
> Thanks,
> David
> 
>>>
>>> s/Implementation/Implementations/
>>
>> Fixed
>>
>>>
>>> The fourth line is no longer needed.
>>
>> Above is the reason I would like to keep the fourth line, since only if you call
>> both disarm() and wake() you have that guarantee that waiter threads will
>> return.
>>
>> Thanks, Robbin
>>
>>>
>>> Thanks,
>>> David
>>>
>>>
>>>> Inc:
>>>> http://cr.openjdk.java.net/~rehn/8214271/4/inc/webrev/
>>>>
>>>> Full:
>>>> http://cr.openjdk.java.net/~rehn/8214271/4/full/webrev/
>>>>
>>>> /Robbin
>>>>
>>>>>
>>>>> Otherwise this all looks good!
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>
>>>>>> Full:
>>>>>> http://cr.openjdk.java.net/~rehn/8214271/3/full/webrev/
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>> On 11/23/18 5:55 PM, Robbin Ehn wrote:
>>>>>>> Forgot RFR in subject.
>>>>>>>
>>>>>>> /Robbin
>>>>>>>
>>>>>>> On 2018-11-23 17:51, Robbin Ehn wrote:
>>>>>>>> Hi all, please review.
>>>>>>>>
>>>>>>>> When a safepoint is ended we need a way to get back to 100% utilization 
>>>>>>>> as fast
>>>>>>>> as possible. 100% utilization means no idle cpu in the system if there is a
>>>>>>>> JavaThread that could be executed. The traditional ways to wake many, e.g.
>>>>>>>> semaphore, pthread_cond, is not implemented with a single syscall 
>>>>>>>> instead they
>>>>>>>> typical do one syscall per thread to wake.
>>>>>>>>
>>>>>>>> This change-set contains that primitive, the WaitBarrier, and a gtest 
>>>>>>>> for it.
>>>>>>>> No actual users, which is in coming patches.
>>>>>>>>
>>>>>>>> The WaitBarrier solves by doing a cooperative semaphore posting, threads 
>>>>>>>> woken
>>>>>>>> will also post. On Linux we can instead directly use a futex and with one
>>>>>>>> syscall wake all. Depending on how many threads and cpus the performance 
>>>>>>>> vary,
>>>>>>>> but a good utilization of the machine, just on the edge of saturated, 
>>>>>>>> the time to reach 100% utilization is around 3 times faster with the 
>>>>>>>> WaitBarrier (where futex is faster than semaphore).
>>>>>>>>
>>>>>>>> Webrev:
>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/webrev/
>>>>>>>>
>>>>>>>> CR:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8214271
>>>>>>>>
>>>>>>>> Passes 100 iterations of gtest on our platforms, both fastdebug and 
>>>>>>>> release.
>>>>>>>> And have been stable when used in safepoints (t1-8) (coming patches).
>>>>>>>>
>>>>>>>> Thanks, Robbin


More information about the hotspot-dev mailing list