RFR(m): 8214271: Fast primitive to wake many threads
David Holmes
david.holmes at oracle.com
Wed Dec 19 04:51:01 UTC 2018
Hi Robbin,
Note this was focusing on the design of the API to ensure the semantics
are clear and well-defined so that future users of the API know how to
use it correctly and safely. It also then allows a review of the
implementation to check back to the spec to see if it behaves as
intended and checks all necessary constraints That's why I wanted it to
be as complete as possible (spurious wakeups allowed? No - then lets
document that) and now allow implementation specific choices unless
unavoidable.
I'll get to updated webrevs before I finish for the year on Friday.
Thanks,
David
On 17/12/2018 8:39 pm, Robbin Ehn wrote:
> Hi David,
>
> On 11/26/18 7:12 AM, David Holmes wrote:
>> Hi Robbin,
>>
>> On 24/11/2018 2:55 am, Robbin Ehn wrote:
>>> Forgot RFR in subject.
>>
>> Yep and now you have two different review threads happening in
>> parallel unfortunately :(
>>
>> - src/hotspot/share/utilities/waitBarrier.hpp
>>
>> I'm studying just the WaitBarrierType API. Is this inherently tied to
>> safepoint usage or intended as a general synchronization tool? As a
>> general tool the API does not have clear semantics on how it should be
>> used:
>
> There is two potential uses-cases in G1, which might require some changes.
>
>>
>> - How do you communicate the current tag between the arming thread and
>> the waiting threads? There would seem to be inherent races between
>> arm(tag) and wait(tag) unless access to the tag itself is synchronized
>> via another mechanism.
>
> As seen in gtest:
> - arm tag
> - publish tag
>
>>
>> - What happens if wake() is called before disarm()? Should it be
>> disallowed? (For other readers disarm() and wake() are distinct
>> operations so that they fit better into the existing safepoint
>> protocol which disarms the safepoint polling page at one spot and
>> wakes blocked threads at another.)
>
> Yes, wake() have no effect if you have not disarmed.
> There are asserts checking this.
>
>>
>> - Should there be constraints that the same thread must
>> arm/disarm/wake? It doesn't really make sense to allow these
>> operations to happen in arbitrary order from multiple threads.
>
> Maybe, *thinking* if there is nice way to get the arming/waking thread
> into an
> argument.
>
>>
>> The semantics for re-arming with the same tag should be clearly set
>> out not "implementation-defined". This should probably be a usage
>> error IMHO - but it comes back to how the tag is expected to be used.
>
> If you have a way to detect that all threads have left the waitbarrier,
> there is
> no problem with re-using the same tag. So yes, re-arming before that
> with the
> same tag should be an error. But since the waitbarrier can't tell the
> difference
> and do not know which tag it was last armed for I have no assert.
>
>>
>> As Andrew mentioned there needs to be documentation regarding spurious
>> wakeups or other "interruptions" at the API level. And I assume a
>> blocked wait() only ever returns in response to a wake().
>
> Yes, if it's disarmed before. Since these are handled internally there
> are no
> spurious wakeups, not sure why we want to document that, except inside the
> implementation. E.g. semaphore.hpp does not document that it do not have
> it.
>
>>
>> Nothwithstanding clarification of the above may I suggest the
>> following rewrite of the API documentation for further clarity:
>>
>> /* Platform independent WaitBarrier API.
>> An armed WaitBarrier prevents threads from advancing until the
>> barrier is disarmed and the waiting threads woken. The barrier is
>> armed by setting a non-zero value - the tag.
>>
>> Expected Usage:
>> - Arming thread:
>> tag = ...; // non-zero value
>> barrier.arm(tag);
>> <work>
>> barrier.disarm();
>> barrier.wake();
>>
>> - After arm(tag) returns any thread calling wait(tag) will block
>> - After disarm() returns any subsequent calls to wait(tag)
>> will not block
>> - After wake() returns all blocked threads are unblocked and
>> eligible to execute again
>> - After calling disarm() and wake() the barrier is ready to be
>> re-armed with a new tag
>>
>> - Waiting threads
>> wait(tag); // don't execute following code unless 'safe'
>> <work>
>>
>> - A call to wait(tag) will block if the barrier is armed with
>> the value 'tag'; else it will return immediately.
>> - A blocked thread is eligible to execute again once the
>> barrier is disarmed and wake() has been called.
>>
>> A primary goal of the WaitBarrier implementation is to wake all waiting
>> threads as fast, and as concurrently, as possible.
>>
>> */
>
> Updated, added <publish tag> after barrier.arm(tag);
>
>>
>> Looking at the "implementation" in this file I'm unclear on the way
>> the Linux specialization is being handled here. Why do we need the
>> template? Can't this just be done the same way we do Semaphore?
>
> Because the gtest can test all implementations available on that platform.
> Meaning I can test both implementation locally and compare performance
> etc..
>
> I'll post one update after I sorted out the other review parts also.
>
> Thanks, Robbin
>
>>
>> More to follow.
>>
>> Thanks,
>> David
>> -----
>>
>>> /Robbin
>>>
>>> On 2018-11-23 17:51, Robbin Ehn wrote:
>>>> Hi all, please review.
>>>>
>>>> When a safepoint is ended we need a way to get back to 100%
>>>> utilization as fast
>>>> as possible. 100% utilization means no idle cpu in the system if
>>>> there is a
>>>> JavaThread that could be executed. The traditional ways to wake
>>>> many, e.g.
>>>> semaphore, pthread_cond, is not implemented with a single syscall
>>>> instead they
>>>> typical do one syscall per thread to wake.
>>>>
>>>> This change-set contains that primitive, the WaitBarrier, and a
>>>> gtest for it.
>>>> No actual users, which is in coming patches.
>>>>
>>>> The WaitBarrier solves by doing a cooperative semaphore posting,
>>>> threads woken
>>>> will also post. On Linux we can instead directly use a futex and
>>>> with one
>>>> syscall wake all. Depending on how many threads and cpus the
>>>> performance vary,
>>>> but a good utilization of the machine, just on the edge of
>>>> saturated, the time to reach 100% utilization is around 3 times
>>>> faster with the WaitBarrier (where futex is faster than semaphore).
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~rehn/8214271/webrev/
>>>>
>>>> CR:
>>>> https://bugs.openjdk.java.net/browse/JDK-8214271
>>>>
>>>> Passes 100 iterations of gtest on our platforms, both fastdebug and
>>>> release.
>>>> And have been stable when used in safepoints (t1-8) (coming patches).
>>>>
>>>> Thanks, Robbin
More information about the hotspot-dev
mailing list