RFR(m): 8214271: Fast primitive to wake many threads
David Holmes
david.holmes at oracle.com
Tue Jan 1 23:35:13 UTC 2019
Hi Robbin,
On 21/12/2018 7:45 pm, Robbin Ehn wrote:
> Hi David,
>
> On 2018-12-21 04:17, David Holmes wrote:
>>
>> Got it - subtle.
>>
>> Further this sounds like a race that could lead to bugs if not used
>> very carefully ie. you can't assume between disarm() and wake() that
>> all threads are blocked.
>
> I didn't realize how subtle this is. I think your original comment that
> disarm/wake should be one operation was spot on.
> Investigating... thinking... testing... yes I think this will work, fixed!
> Sorry for not looking more into this before.
I'm now curious how this will actually work in the context of the
safepoint changes?
>>
>> I think perhaps this needs to be expanded to make this more obvious:
>>
>> 68 // - A call to wait(tag) will block if the barrier is armed
>> with the value
>> 69 // 'tag'; else it will return immediately.
>> 70 // - A blocked thread is eligible to execute again once the
>> barrier is
>> 71 // disarmed and wake() has been called.
>> + - A call to wait(tag) that would block if it continued, but
>> instead
>> + is descheduled, may return immediately if scheduled after a
>> + call to disarm(), but before the call to wake().
>>
>> It also made me realize that in the general case (not when used with
>> safepoints I think due to other state checks) a wake() may stall due
>> to threads with a previous tag entering the wait() late.
>
> I added a double checking in the semaphore version, this means both
> implementation should have progress guarantee.
>
> Making this v5 a bit large due to a lot of comments being changed.
>
> Inc:
> http://cr.openjdk.java.net/~rehn/8214271/5/inc/webrev/
Nit: I would have kept disarm() rather than wake() as I like the
arm/disarm duality.
void GenericWaitBarrier::wait(int barrier_tag) {
assert(barrier_tag != 0, "Trying to wait on disarmed value");
+ if (barrier_tag == 0 && barrier_tag != _barrier_tag) {
+ OrderAccess::fence();
+ return;
+ }
I don't understand what the above is doing. A barrier_tag of 0 is a
programming error caught during testing in debug builds. You don't need
to account for it being 0 in product because this isn't something that
can come in from an external source - we have full code control here.
And even if you want to be this paranoid why would you need the fence?
Thanks,
David
-----
> Full:
> http://cr.openjdk.java.net/~rehn/8214271/5/full/webrev/
>
> gtest passes thousands of loops locally and hundreds in mach5.
>
> Thanks, Robbin
>
>>
>> Thanks,
>> David
>>
>>>>
>>>> s/Implementation/Implementations/
>>>
>>> Fixed
>>>
>>>>
>>>> The fourth line is no longer needed.
>>>
>>> Above is the reason I would like to keep the fourth line, since only
>>> if you call
>>> both disarm() and wake() you have that guarantee that waiter threads
>>> will
>>> return.
>>>
>>> Thanks, Robbin
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>
>>>>> Inc:
>>>>> http://cr.openjdk.java.net/~rehn/8214271/4/inc/webrev/
>>>>>
>>>>> Full:
>>>>> http://cr.openjdk.java.net/~rehn/8214271/4/full/webrev/
>>>>>
>>>>> /Robbin
>>>>>
>>>>>>
>>>>>> Otherwise this all looks good!
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>
>>>>>>> Full:
>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/3/full/webrev/
>>>>>>>
>>>>>>> Thanks, Robbin
>>>>>>>
>>>>>>> On 11/23/18 5:55 PM, Robbin Ehn wrote:
>>>>>>>> Forgot RFR in subject.
>>>>>>>>
>>>>>>>> /Robbin
>>>>>>>>
>>>>>>>> On 2018-11-23 17:51, Robbin Ehn wrote:
>>>>>>>>> Hi all, please review.
>>>>>>>>>
>>>>>>>>> When a safepoint is ended we need a way to get back to 100%
>>>>>>>>> utilization as fast
>>>>>>>>> as possible. 100% utilization means no idle cpu in the system
>>>>>>>>> if there is a
>>>>>>>>> JavaThread that could be executed. The traditional ways to wake
>>>>>>>>> many, e.g.
>>>>>>>>> semaphore, pthread_cond, is not implemented with a single
>>>>>>>>> syscall instead they
>>>>>>>>> typical do one syscall per thread to wake.
>>>>>>>>>
>>>>>>>>> This change-set contains that primitive, the WaitBarrier, and a
>>>>>>>>> gtest for it.
>>>>>>>>> No actual users, which is in coming patches.
>>>>>>>>>
>>>>>>>>> The WaitBarrier solves by doing a cooperative semaphore
>>>>>>>>> posting, threads woken
>>>>>>>>> will also post. On Linux we can instead directly use a futex
>>>>>>>>> and with one
>>>>>>>>> syscall wake all. Depending on how many threads and cpus the
>>>>>>>>> performance vary,
>>>>>>>>> but a good utilization of the machine, just on the edge of
>>>>>>>>> saturated, the time to reach 100% utilization is around 3 times
>>>>>>>>> faster with the WaitBarrier (where futex is faster than
>>>>>>>>> semaphore).
>>>>>>>>>
>>>>>>>>> Webrev:
>>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/webrev/
>>>>>>>>>
>>>>>>>>> CR:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8214271
>>>>>>>>>
>>>>>>>>> Passes 100 iterations of gtest on our platforms, both fastdebug
>>>>>>>>> and release.
>>>>>>>>> And have been stable when used in safepoints (t1-8) (coming
>>>>>>>>> patches).
>>>>>>>>>
>>>>>>>>> Thanks, Robbin
More information about the hotspot-dev
mailing list