RFR(m): 8214271: Fast primitive to wake many threads
David Holmes
david.holmes at oracle.com
Fri Dec 21 03:17:17 UTC 2018
On 21/12/2018 12:52 am, Robbin Ehn wrote:
> Hi David,
>
> On 12/20/18 1:26 PM, David Holmes wrote:
>> On 20/12/2018 10:10 pm, Robbin Ehn wrote:
>>> Hi David,
>>>
>>> On 12/20/18 7:08 AM, David Holmes wrote:
>>>> Hi Robbin,
>>>>
>>>> Looks good, small doc follow up below ...
>>>
>>> Thanks!
>>>
>>>> Sound reasonable?
>>>
>>> Yes, I also added the comment from my other mail, let me know what
>>> you think.
>>
>> + // Guarantees not to return until disarm() is called,
>> + // if called with currently armed tag (otherwise returns
>> immediately).
>> + // Implementation must guarantee no spurious wakeups.
>> // Guarantees to return if disarm() and wake() is called.
>>
>> Should the first line say "disarm() and wake()" or just "wake()"?
>
> A waiter thread calling wait() that gets context switched immediately after
> entering wait() and stay off proc until after disarm() is called but comes
> back before wake() is called, may see the waitbarrier as disarmed.
> So disarm() can wake a waiter thread (returning from wait()).
> That was what I tried to say there.
Got it - subtle.
Further this sounds like a race that could lead to bugs if not used very
carefully ie. you can't assume between disarm() and wake() that all
threads are blocked.
I think perhaps this needs to be expanded to make this more obvious:
68 // - A call to wait(tag) will block if the barrier is armed
with the value
69 // 'tag'; else it will return immediately.
70 // - A blocked thread is eligible to execute again once the
barrier is
71 // disarmed and wake() has been called.
+ - A call to wait(tag) that would block if it continued, but
instead
+ is descheduled, may return immediately if scheduled after a
+ call to disarm(), but before the call to wake().
It also made me realize that in the general case (not when used with
safepoints I think due to other state checks) a wake() may stall due to
threads with a previous tag entering the wait() late.
Thanks,
David
>>
>> s/Implementation/Implementations/
>
> Fixed
>
>>
>> The fourth line is no longer needed.
>
> Above is the reason I would like to keep the fourth line, since only if
> you call
> both disarm() and wake() you have that guarantee that waiter threads will
> return.
>
> Thanks, Robbin
>
>>
>> Thanks,
>> David
>>
>>
>>> Inc:
>>> http://cr.openjdk.java.net/~rehn/8214271/4/inc/webrev/
>>>
>>> Full:
>>> http://cr.openjdk.java.net/~rehn/8214271/4/full/webrev/
>>>
>>> /Robbin
>>>
>>>>
>>>> Otherwise this all looks good!
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>
>>>>> Full:
>>>>> http://cr.openjdk.java.net/~rehn/8214271/3/full/webrev/
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>> On 11/23/18 5:55 PM, Robbin Ehn wrote:
>>>>>> Forgot RFR in subject.
>>>>>>
>>>>>> /Robbin
>>>>>>
>>>>>> On 2018-11-23 17:51, Robbin Ehn wrote:
>>>>>>> Hi all, please review.
>>>>>>>
>>>>>>> When a safepoint is ended we need a way to get back to 100%
>>>>>>> utilization as fast
>>>>>>> as possible. 100% utilization means no idle cpu in the system if
>>>>>>> there is a
>>>>>>> JavaThread that could be executed. The traditional ways to wake
>>>>>>> many, e.g.
>>>>>>> semaphore, pthread_cond, is not implemented with a single syscall
>>>>>>> instead they
>>>>>>> typical do one syscall per thread to wake.
>>>>>>>
>>>>>>> This change-set contains that primitive, the WaitBarrier, and a
>>>>>>> gtest for it.
>>>>>>> No actual users, which is in coming patches.
>>>>>>>
>>>>>>> The WaitBarrier solves by doing a cooperative semaphore posting,
>>>>>>> threads woken
>>>>>>> will also post. On Linux we can instead directly use a futex and
>>>>>>> with one
>>>>>>> syscall wake all. Depending on how many threads and cpus the
>>>>>>> performance vary,
>>>>>>> but a good utilization of the machine, just on the edge of
>>>>>>> saturated, the time to reach 100% utilization is around 3 times
>>>>>>> faster with the WaitBarrier (where futex is faster than semaphore).
>>>>>>>
>>>>>>> Webrev:
>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/webrev/
>>>>>>>
>>>>>>> CR:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8214271
>>>>>>>
>>>>>>> Passes 100 iterations of gtest on our platforms, both fastdebug
>>>>>>> and release.
>>>>>>> And have been stable when used in safepoints (t1-8) (coming
>>>>>>> patches).
>>>>>>>
>>>>>>> Thanks, Robbin
More information about the hotspot-dev
mailing list