RFR(m): 8214271: Fast primitive to wake many threads
David Holmes
david.holmes at oracle.com
Mon Nov 26 06:12:45 UTC 2018
Hi Robbin,
On 24/11/2018 2:55 am, Robbin Ehn wrote:
> Forgot RFR in subject.
Yep and now you have two different review threads happening in parallel
unfortunately :(
- src/hotspot/share/utilities/waitBarrier.hpp
I'm studying just the WaitBarrierType API. Is this inherently tied to
safepoint usage or intended as a general synchronization tool? As a
general tool the API does not have clear semantics on how it should be used:
- How do you communicate the current tag between the arming thread and
the waiting threads? There would seem to be inherent races between
arm(tag) and wait(tag) unless access to the tag itself is synchronized
via another mechanism.
- What happens if wake() is called before disarm()? Should it be
disallowed? (For other readers disarm() and wake() are distinct
operations so that they fit better into the existing safepoint protocol
which disarms the safepoint polling page at one spot and wakes blocked
threads at another.)
- Should there be constraints that the same thread must arm/disarm/wake?
It doesn't really make sense to allow these operations to happen in
arbitrary order from multiple threads.
The semantics for re-arming with the same tag should be clearly set out
not "implementation-defined". This should probably be a usage error IMHO
- but it comes back to how the tag is expected to be used.
As Andrew mentioned there needs to be documentation regarding spurious
wakeups or other "interruptions" at the API level. And I assume a
blocked wait() only ever returns in response to a wake().
Nothwithstanding clarification of the above may I suggest the following
rewrite of the API documentation for further clarity:
/* Platform independent WaitBarrier API.
An armed WaitBarrier prevents threads from advancing until the
barrier is disarmed and the waiting threads woken. The barrier is
armed by setting a non-zero value - the tag.
Expected Usage:
- Arming thread:
tag = ...; // non-zero value
barrier.arm(tag);
<work>
barrier.disarm();
barrier.wake();
- After arm(tag) returns any thread calling wait(tag) will block
- After disarm() returns any subsequent calls to wait(tag) will
not block
- After wake() returns all blocked threads are unblocked and
eligible to execute again
- After calling disarm() and wake() the barrier is ready to be
re-armed with a new tag
- Waiting threads
wait(tag); // don't execute following code unless 'safe'
<work>
- A call to wait(tag) will block if the barrier is armed with the
value 'tag'; else it will return immediately.
- A blocked thread is eligible to execute again once the barrier
is disarmed and wake() has been called.
A primary goal of the WaitBarrier implementation is to wake all waiting
threads as fast, and as concurrently, as possible.
*/
Looking at the "implementation" in this file I'm unclear on the way the
Linux specialization is being handled here. Why do we need the template?
Can't this just be done the same way we do Semaphore?
More to follow.
Thanks,
David
-----
> /Robbin
>
> On 2018-11-23 17:51, Robbin Ehn wrote:
>> Hi all, please review.
>>
>> When a safepoint is ended we need a way to get back to 100%
>> utilization as fast
>> as possible. 100% utilization means no idle cpu in the system if there
>> is a
>> JavaThread that could be executed. The traditional ways to wake many,
>> e.g.
>> semaphore, pthread_cond, is not implemented with a single syscall
>> instead they
>> typical do one syscall per thread to wake.
>>
>> This change-set contains that primitive, the WaitBarrier, and a gtest
>> for it.
>> No actual users, which is in coming patches.
>>
>> The WaitBarrier solves by doing a cooperative semaphore posting,
>> threads woken
>> will also post. On Linux we can instead directly use a futex and with one
>> syscall wake all. Depending on how many threads and cpus the
>> performance vary,
>> but a good utilization of the machine, just on the edge of saturated,
>> the time to reach 100% utilization is around 3 times faster with the
>> WaitBarrier (where futex is faster than semaphore).
>>
>> Webrev:
>> http://cr.openjdk.java.net/~rehn/8214271/webrev/
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8214271
>>
>> Passes 100 iterations of gtest on our platforms, both fastdebug and
>> release.
>> And have been stable when used in safepoints (t1-8) (coming patches).
>>
>> Thanks, Robbin
More information about the hotspot-dev
mailing list