RFR(m): 8214271: Fast primitive to wake many threads
Daniel D. Daugherty
daniel.daugherty at oracle.com
Thu Jan 10 15:57:11 UTC 2019
> Including Dan's comments:
> Full: http://cr.openjdk.java.net/~rehn/8214271/6/full/webrev/
> Inc : http://cr.openjdk.java.net/~rehn/8214271/6/inc/webrev/
src/hotspot/os/linux/waitBarrier_linux.cpp
No comments.
src/hotspot/os/linux/waitBarrier_linux.hpp
No comments.
src/hotspot/share/utilities/waitBarrier.hpp
No comments beyond David's.
src/hotspot/share/utilities/waitBarrier_generic.cpp
No comments beyond David's.
src/hotspot/share/utilities/waitBarrier_generic.hpp
No comments.
test/hotspot/gtest/utilities/test_waitBarrier.cpp
No comments.
Thumbs up!
Dan
On 1/9/19 7:57 AM, Robbin Ehn wrote:
> Hi David,
>
> On 1/9/19 4:40 AM, David Holmes wrote:
>> Hi Robbin,
>>
>> No further significant comments, lets just see how this plays out.
>
> Yes, thanks!
>
> Fixed nits.
>
> If Dan have something more non-trivial I'll publish a v7.
>
> /Robbin
>
>>
>> Some minor nits:
>>
>> src/hotspot/share/utilities/waitBarrier.hpp
>>
>> ! // A primary goal of the WaitBarrier implementation is to disarm
>> all waiting
>>
>> s/disarm/wake/
>>
>> That was one place a global replace shouldn't have been applied. :)
>
> Fixed!
>
>>
>> ! // - Calling disarm() guarantees any thread calling or called
>> wait(tag) will
>>
>> "or called" is not grammatically correct. Perhaps:
>>
>> // - Calling disarm() guarantees any thread now calling or that
>> has called wait(tag) will
>>
>
> Fixed!
>
>>
>> // Guarantees any thread called wait() will be awake when it
>> returns.
>>
>> s/called/that called/
>>
>
> Fixed!
>
>> ---
>>
>> src/hotspot/share/utilities/waitBarrier_generic.cpp
>>
>> ! // disarm store must not float below.
>>
>> s/float/sink/
>>
>
> Fixed!
>
>> 74 // API specifies wake() must provides a trailing fence.
>>
>> s/wake/disarm/
>> s/provides/provide/
>
> Fixed
>
>>
>> 81 // API specifies wait() must provides a trailing fence.
>>
>> s/provides/provide/
>
> Fixed
>
>>
>> Thanks,
>> David
>>
>> On 8/01/2019 8:42 pm, Robbin Ehn wrote:
>>> Hi David,
>>>
>>> On 1/2/19 12:35 AM, David Holmes wrote:
>>>>>> Further this sounds like a race that could lead to bugs if not
>>>>>> used very carefully ie. you can't assume between disarm() and
>>>>>> wake() that all threads are blocked.
>>>>>
>>>>> I didn't realize how subtle this is. I think your original comment
>>>>> that
>>>>> disarm/wake should be one operation was spot on.
>>>>> Investigating... thinking... testing... yes I think this will
>>>>> work, fixed!
>>>>> Sorry for not looking more into this before.
>>>>
>>>> I'm now curious how this will actually work in the context of the
>>>> safepoint changes?
>>>
>>> Since code already handle this 'invariant' with threads not being
>>> block between disarm() and wake(), doing it one operation just very
>>> slightly increases the chance that a thread will be blocked when we
>>> actually can handle it to be running, but reduces the chance to hit
>>> a false positive TLH poll.
>>> (with TLH we have a two-step un-synchronizing out of safepoints
>>> where we must change global safepoint state before changing the
>>> thread polling state)
>>>
>>> (I have some thoughts on simplifying TLH/safepoint states)
>>>
>>>> Nit: I would have kept disarm() rather than wake() as I like the
>>>> arm/disarm duality.
>>>
>>> Yes, me too. Not sure why I did the opposite, fixed!
>>>
>>>>
>>>> void GenericWaitBarrier::wait(int barrier_tag) {
>>>> assert(barrier_tag != 0, "Trying to wait on disarmed value");
>>>> + if (barrier_tag == 0 && barrier_tag != _barrier_tag) {
>>>> + OrderAccess::fence();
>>>> + return;
>>>> + }
>>>>
>>>> I don't understand what the above is doing. A barrier_tag of 0 is a
>>>> programming error caught during testing in debug builds. You don't
>>>> need to account for it being 0 in product because this isn't
>>>> something that can come in from an external source - we have full
>>>> code control here. And even if you want to be this paranoid why
>>>> would you need the fence?
>>>
>>> Fixed, but kept the fence, since we say we are providing a trailing
>>> fence.
>>> Otherwise I would like to add that exception to the description of
>>> wait().
>>>
>>> Including Dan's comments:
>>> Full: http://cr.openjdk.java.net/~rehn/8214271/6/full/webrev/
>>> Inc : http://cr.openjdk.java.net/~rehn/8214271/6/inc/webrev/
>>>
>>>
>>> Thanks, Robbin
>>>
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> Full:
>>>>> http://cr.openjdk.java.net/~rehn/8214271/5/full/webrev/
>>>>>
>>>>> gtest passes thousands of loops locally and hundreds in mach5.
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>>>
>>>>>>>> s/Implementation/Implementations/
>>>>>>>
>>>>>>> Fixed
>>>>>>>
>>>>>>>>
>>>>>>>> The fourth line is no longer needed.
>>>>>>>
>>>>>>> Above is the reason I would like to keep the fourth line, since
>>>>>>> only if you call
>>>>>>> both disarm() and wake() you have that guarantee that waiter
>>>>>>> threads will
>>>>>>> return.
>>>>>>>
>>>>>>> Thanks, Robbin
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>> Inc:
>>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/4/inc/webrev/
>>>>>>>>>
>>>>>>>>> Full:
>>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/4/full/webrev/
>>>>>>>>>
>>>>>>>>> /Robbin
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Otherwise this all looks good!
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Full:
>>>>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/3/full/webrev/
>>>>>>>>>>>
>>>>>>>>>>> Thanks, Robbin
>>>>>>>>>>>
>>>>>>>>>>> On 11/23/18 5:55 PM, Robbin Ehn wrote:
>>>>>>>>>>>> Forgot RFR in subject.
>>>>>>>>>>>>
>>>>>>>>>>>> /Robbin
>>>>>>>>>>>>
>>>>>>>>>>>> On 2018-11-23 17:51, Robbin Ehn wrote:
>>>>>>>>>>>>> Hi all, please review.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When a safepoint is ended we need a way to get back to
>>>>>>>>>>>>> 100% utilization as fast
>>>>>>>>>>>>> as possible. 100% utilization means no idle cpu in the
>>>>>>>>>>>>> system if there is a
>>>>>>>>>>>>> JavaThread that could be executed. The traditional ways to
>>>>>>>>>>>>> wake many, e.g.
>>>>>>>>>>>>> semaphore, pthread_cond, is not implemented with a single
>>>>>>>>>>>>> syscall instead they
>>>>>>>>>>>>> typical do one syscall per thread to wake.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This change-set contains that primitive, the WaitBarrier,
>>>>>>>>>>>>> and a gtest for it.
>>>>>>>>>>>>> No actual users, which is in coming patches.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The WaitBarrier solves by doing a cooperative semaphore
>>>>>>>>>>>>> posting, threads woken
>>>>>>>>>>>>> will also post. On Linux we can instead directly use a
>>>>>>>>>>>>> futex and with one
>>>>>>>>>>>>> syscall wake all. Depending on how many threads and cpus
>>>>>>>>>>>>> the performance vary,
>>>>>>>>>>>>> but a good utilization of the machine, just on the edge of
>>>>>>>>>>>>> saturated, the time to reach 100% utilization is around 3
>>>>>>>>>>>>> times faster with the WaitBarrier (where futex is faster
>>>>>>>>>>>>> than semaphore).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~rehn/8214271/webrev/
>>>>>>>>>>>>>
>>>>>>>>>>>>> CR:
>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8214271
>>>>>>>>>>>>>
>>>>>>>>>>>>> Passes 100 iterations of gtest on our platforms, both
>>>>>>>>>>>>> fastdebug and release.
>>>>>>>>>>>>> And have been stable when used in safepoints (t1-8)
>>>>>>>>>>>>> (coming patches).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks, Robbin
More information about the hotspot-dev
mailing list