RFR: 8130728: Disable WorkAroundNPTLTimedWaitHang by default
Daniel D. Daugherty
daniel.daugherty at oracle.com
Thu Jul 9 13:32:33 UTC 2015
> http://cr.openjdk.java.net/~dholmes/8130728/webrev/
src/share/vm/runtime/globals.hpp
No comments.
Thumbs up.
Time to see what shakes out with this option turned off.
The BSD/MacOS X blame belongs to me. Our thinking at the time was
to keep the "bsd" files matching the "linux" version unless we
had a specific reason for changing the code. Of course, we thought
we would come back later and clean things up... which just didn't
happen... I don't know the AIX history.
Dan
On 7/9/15 3:27 AM, David Holmes wrote:
> https://bugs.openjdk.java.net/browse/JDK-8130728
>
> The workaround triggered by WorkAroundNPTLTimedWaitHang (an
> experimental flag set to 1 by default) was to alleviate a hang that
> could occur on a pthread_cond_timedwait on Linux, if the timeout value
> was a time in the past - see JDK-6314875. This glibc bug was fixed in
> 2005 in glibc 2.3.5-3
>
> https://lists.debian.org/debian-glibc/2005/08/msg00201.html
>
> but the workaround persists and was (unfortunately) copied into the
> BSD and AIX ports.
>
> It is time to remove that workaround but before we can do that we need
> to be sure that we are not in fact hitting the workaround code. To
> that end I propose to use this bug to switch the flag's value to 0 to
> verify correct operation.
>
> If that goes well then we can remove the code either later in JDK 9 or
> early in JDK 10.
>
> To be clear the flag impacts both the wait and the signal part of the
> condvar usage (park and unpark). On the wait side we have:
>
> status = pthread_cond_timedwait(_cond, _mutex, &abst);
> if (status != 0 && WorkAroundNPTLTimedWaitHang) {
> pthread_cond_destroy(_cond);
> pthread_cond_init(_cond, os::Linux::condAttr());
> }
>
> so we will now skip this potential recovery code.
>
> On the signal side we signal while holding the mutex if the workaround
> is enabled, and we signal after dropping the mutex otherwise. So this
> code will now be using a new path. Here is the code in
> PlatformEvent::unpark
>
> assert(AnyWaiters == 0 || AnyWaiters == 1, "invariant");
> if (AnyWaiters != 0 && WorkAroundNPTLTimedWaitHang) {
> AnyWaiters = 0;
> pthread_cond_signal(_cond);
> }
> status = pthread_mutex_unlock(_mutex);
> assert_status(status == 0, status, "mutex_unlock");
> if (AnyWaiters != 0) {
> // Note that we signal() *after* dropping the lock for "immortal"
> Events.
> // This is safe and avoids a common class of futile wakeups. In rare
> // circumstances this can cause a thread to return prematurely from
> // cond_{timed}wait() but the spurious wakeup is benign and the
> victim
> // will simply re-test the condition and re-park itself.
> // This provides particular benefit if the underlying platform
> does not
> // provide wait morphing.
> status = pthread_cond_signal(_cond);
> assert_status(status == 0, status, "cond_signal");
> }
>
> and similarly in Parker::unpark
>
> if (WorkAroundNPTLTimedWaitHang) {
> status = pthread_cond_signal(&_cond[_cur_index]);
> assert(status == 0, "invariant");
> status = pthread_mutex_unlock(_mutex);
> assert(status == 0, "invariant");
> } else {
> status = pthread_mutex_unlock(_mutex);
> assert(status == 0, "invariant");
> status = pthread_cond_signal(&_cond[_cur_index]);
> assert(status == 0, "invariant");
> }
>
> This may cause performance perturbations that will need to be
> investigated - but in theory, as per the comment above, signalling
> outside the lock should be beneficial in the linux case because there
> is no wait-morphing (where a signal simply takes a waiting thread and
> places it into the mutex queue thereby avoiding the need to awaken the
> thread so it can enqueue itself).
>
> The change itself is of course trivial:
>
> http://cr.openjdk.java.net/~dholmes/8130728/webrev/
>
> I'd appreciate any comments/concerns particularly from the BSD and AIX
> folk who acquired this unnecessary workaround.
>
> If deemed necessary we could add a flag that controls whether the
> signal happens with or without the lock held.
>
> Thanks,
> David
More information about the porters-dev
mailing list