RFR: 8130728: Disable WorkAroundNPTLTimedWaitHang by default
David Holmes
david.holmes at oracle.com
Thu Jul 9 09:27:43 UTC 2015
https://bugs.openjdk.java.net/browse/JDK-8130728
The workaround triggered by WorkAroundNPTLTimedWaitHang (an experimental
flag set to 1 by default) was to alleviate a hang that could occur on a
pthread_cond_timedwait on Linux, if the timeout value was a time in the
past - see JDK-6314875. This glibc bug was fixed in 2005 in glibc 2.3.5-3
https://lists.debian.org/debian-glibc/2005/08/msg00201.html
but the workaround persists and was (unfortunately) copied into the BSD
and AIX ports.
It is time to remove that workaround but before we can do that we need
to be sure that we are not in fact hitting the workaround code. To that
end I propose to use this bug to switch the flag's value to 0 to verify
correct operation.
If that goes well then we can remove the code either later in JDK 9 or
early in JDK 10.
To be clear the flag impacts both the wait and the signal part of the
condvar usage (park and unpark). On the wait side we have:
status = pthread_cond_timedwait(_cond, _mutex, &abst);
if (status != 0 && WorkAroundNPTLTimedWaitHang) {
pthread_cond_destroy(_cond);
pthread_cond_init(_cond, os::Linux::condAttr());
}
so we will now skip this potential recovery code.
On the signal side we signal while holding the mutex if the workaround
is enabled, and we signal after dropping the mutex otherwise. So this
code will now be using a new path. Here is the code in PlatformEvent::unpark
assert(AnyWaiters == 0 || AnyWaiters == 1, "invariant");
if (AnyWaiters != 0 && WorkAroundNPTLTimedWaitHang) {
AnyWaiters = 0;
pthread_cond_signal(_cond);
}
status = pthread_mutex_unlock(_mutex);
assert_status(status == 0, status, "mutex_unlock");
if (AnyWaiters != 0) {
// Note that we signal() *after* dropping the lock for "immortal"
Events.
// This is safe and avoids a common class of futile wakeups. In rare
// circumstances this can cause a thread to return prematurely from
// cond_{timed}wait() but the spurious wakeup is benign and the victim
// will simply re-test the condition and re-park itself.
// This provides particular benefit if the underlying platform does not
// provide wait morphing.
status = pthread_cond_signal(_cond);
assert_status(status == 0, status, "cond_signal");
}
and similarly in Parker::unpark
if (WorkAroundNPTLTimedWaitHang) {
status = pthread_cond_signal(&_cond[_cur_index]);
assert(status == 0, "invariant");
status = pthread_mutex_unlock(_mutex);
assert(status == 0, "invariant");
} else {
status = pthread_mutex_unlock(_mutex);
assert(status == 0, "invariant");
status = pthread_cond_signal(&_cond[_cur_index]);
assert(status == 0, "invariant");
}
This may cause performance perturbations that will need to be
investigated - but in theory, as per the comment above, signalling
outside the lock should be beneficial in the linux case because there is
no wait-morphing (where a signal simply takes a waiting thread and
places it into the mutex queue thereby avoiding the need to awaken the
thread so it can enqueue itself).
The change itself is of course trivial:
http://cr.openjdk.java.net/~dholmes/8130728/webrev/
I'd appreciate any comments/concerns particularly from the BSD and AIX
folk who acquired this unnecessary workaround.
If deemed necessary we could add a flag that controls whether the signal
happens with or without the lock held.
Thanks,
David
More information about the porters-dev
mailing list