RFR(m): 8214271: Fast primitive to wake many threads

David Holmes david.holmes at oracle.com
Fri Dec 21 03:17:17 UTC 2018


On 21/12/2018 12:52 am, Robbin Ehn wrote:
> Hi David,
> 
> On 12/20/18 1:26 PM, David Holmes wrote:
>> On 20/12/2018 10:10 pm, Robbin Ehn wrote:
>>> Hi David,
>>>
>>> On 12/20/18 7:08 AM, David Holmes wrote:
>>>> Hi Robbin,
>>>>
>>>> Looks good, small doc follow up below ...
>>>
>>> Thanks!
>>>
>>>> Sound reasonable?
>>>
>>> Yes, I also added the comment from my other mail, let me know what 
>>> you think.
>>
>> +   // Guarantees not to return until disarm() is called,
>> +   // if called with currently armed tag (otherwise returns 
>> immediately).
>> +   // Implementation must guarantee no spurious wakeups.
>>      // Guarantees to return if disarm() and wake() is called.
>>
>> Should the first line say "disarm() and wake()" or just "wake()"?
> 
> A waiter thread calling wait() that gets context switched immediately after
> entering wait() and stay off proc until after disarm() is called but comes
> back before wake() is called, may see the waitbarrier as disarmed.
> So disarm() can wake a waiter thread (returning from wait()).
> That was what I tried to say there.

Got it - subtle.

Further this sounds like a race that could lead to bugs if not used very 
carefully ie. you can't assume between disarm() and wake() that all 
threads are blocked.

I think perhaps this needs to be expanded to make this more obvious:

   68 //    - A call to wait(tag) will block if the barrier is armed 
with the value
   69 //      'tag'; else it will return immediately.
   70 //    - A blocked thread is eligible to execute again once the 
barrier is
   71 //      disarmed and wake() has been called.
+          - A call to wait(tag) that would block if it continued, but 
instead
+            is descheduled, may return immediately if scheduled after a
+           call to disarm(), but before the call to wake().

It also made me realize that in the general case (not when used with 
safepoints I think due to other state checks) a wake() may stall due to 
threads with a previous tag entering the wait() late.

Thanks,
David

>>
>> s/Implementation/Implementations/
> 
> Fixed
> 
>>
>> The fourth line is no longer needed.
> 
> Above is the reason I would like to keep the fourth line, since only if 
> you call
> both disarm() and wake() you have that guarantee that waiter threads will
> return.
> 
> Thanks, Robbin
> 
>>
>> Thanks,
>> David
>>
>>
>>> Inc:
>>> http://cr.openjdk.java.net/~rehn/8214271/4/inc/webrev/
>>>
>>> Full:
>>> http://cr.openjdk.java.net/~rehn/8214271/4/full/webrev/
>>>
>>> /Robbin
>>>
>>>>
>>>> Otherwise this all looks good!
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>
>>>>> Full:
>>>>> http://cr.openjdk.java.net/~rehn/8214271/3/full/webrev/
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>> On 11/23/18 5:55 PM, Robbin Ehn wrote:
>>>>>> Forgot RFR in subject.
>>>>>>
>>>>>> /Robbin
>>>>>>
>>>>>> On 2018-11-23 17:51, Robbin Ehn wrote:
>>>>>>> Hi all, please review.
>>>>>>>
>>>>>>> When a safepoint is ended we need a way to get back to 100% 
>>>>>>> utilization as fast
>>>>>>> as possible. 100% utilization means no idle cpu in the system if 
>>>>>>> there is a
>>>>>>> JavaThread that could be executed. The traditional ways to wake 
>>>>>>> many, e.g.
>>>>>>> semaphore, pthread_cond, is not implemented with a single syscall 
>>>>>>> instead they
>>>>>>> typical do one syscall per thread to wake.
>>>>>>>
>>>>>>> This change-set contains that primitive, the WaitBarrier, and a 
>>>>>>> gtest for it.
>>>>>>> No actual users, which is in coming patches.
>>>>>>>
>>>>>>> The WaitBarrier solves by doing a cooperative semaphore posting, 
>>>>>>> threads woken
>>>>>>> will also post. On Linux we can instead directly use a futex and 
>>>>>>> with one
>>>>>>> syscall wake all. Depending on how many threads and cpus the 
>>>>>>> performance vary,
>>>>>>> but a good utilization of the machine, just on the edge of 
>>>>>>> saturated, the time to reach 100% utilization is around 3 times 
>>>>>>> faster with the WaitBarrier (where futex is faster than semaphore).
>>>>>>>
>>>>>>> Webrev:
>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/webrev/
>>>>>>>
>>>>>>> CR:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8214271
>>>>>>>
>>>>>>> Passes 100 iterations of gtest on our platforms, both fastdebug 
>>>>>>> and release.
>>>>>>> And have been stable when used in safepoints (t1-8) (coming 
>>>>>>> patches).
>>>>>>>
>>>>>>> Thanks, Robbin


More information about the hotspot-dev mailing list