RFR: 8130728: Disable WorkAroundNPTLTimedWaitHang by default

Daniel D. Daugherty daniel.daugherty at oracle.com
Thu Jul 9 13:32:33 UTC 2015


 > http://cr.openjdk.java.net/~dholmes/8130728/webrev/

src/share/vm/runtime/globals.hpp
     No comments.

Thumbs up.

Time to see what shakes out with this option turned off.

The BSD/MacOS X blame belongs to me. Our thinking at the time was
to keep the "bsd" files matching the "linux" version unless we
had a specific reason for changing the code. Of course, we thought
we would come back later and clean things up... which just didn't
happen... I don't know the AIX history.

Dan


On 7/9/15 3:27 AM, David Holmes wrote:
> https://bugs.openjdk.java.net/browse/JDK-8130728
>
> The workaround triggered by WorkAroundNPTLTimedWaitHang (an 
> experimental flag set to 1 by default) was to alleviate a hang that 
> could occur on a pthread_cond_timedwait on Linux, if the timeout value 
> was a time in the past - see JDK-6314875. This glibc bug was fixed in 
> 2005 in glibc 2.3.5-3
>
> https://lists.debian.org/debian-glibc/2005/08/msg00201.html
>
> but the workaround persists and was (unfortunately) copied into the 
> BSD and AIX ports.
>
> It is time to remove that workaround but before we can do that we need 
> to be sure that we are not in fact hitting the workaround code. To 
> that end I propose to use this bug to switch the flag's value to 0 to 
> verify correct operation.
>
> If that goes well then we can remove the code either later in JDK 9 or 
> early in JDK 10.
>
> To be clear the flag impacts both the wait and the signal part of the 
> condvar usage (park and unpark). On the wait side we have:
>
>     status = pthread_cond_timedwait(_cond, _mutex, &abst);
>     if (status != 0 && WorkAroundNPTLTimedWaitHang) {
>       pthread_cond_destroy(_cond);
>       pthread_cond_init(_cond, os::Linux::condAttr());
>     }
>
> so we will now skip this potential recovery code.
>
> On the signal side we signal while holding the mutex if the workaround 
> is enabled, and we signal after dropping the mutex otherwise. So this 
> code will now be using a new path. Here is the code in 
> PlatformEvent::unpark
>
>   assert(AnyWaiters == 0 || AnyWaiters == 1, "invariant");
>   if (AnyWaiters != 0 && WorkAroundNPTLTimedWaitHang) {
>     AnyWaiters = 0;
>     pthread_cond_signal(_cond);
>   }
>   status = pthread_mutex_unlock(_mutex);
>   assert_status(status == 0, status, "mutex_unlock");
>   if (AnyWaiters != 0) {
>     // Note that we signal() *after* dropping the lock for "immortal" 
> Events.
>     // This is safe and avoids a common class of  futile wakeups. In rare
>     // circumstances this can cause a thread to return prematurely from
>     // cond_{timed}wait() but the spurious wakeup is benign and the 
> victim
>     // will simply re-test the condition and re-park itself.
>     // This provides particular benefit if the underlying platform 
> does not
>     // provide wait morphing.
>     status = pthread_cond_signal(_cond);
>     assert_status(status == 0, status, "cond_signal");
>   }
>
> and similarly in Parker::unpark
>
>      if (WorkAroundNPTLTimedWaitHang) {
>         status = pthread_cond_signal(&_cond[_cur_index]);
>         assert(status == 0, "invariant");
>         status = pthread_mutex_unlock(_mutex);
>         assert(status == 0, "invariant");
>       } else {
>         status = pthread_mutex_unlock(_mutex);
>         assert(status == 0, "invariant");
>         status = pthread_cond_signal(&_cond[_cur_index]);
>         assert(status == 0, "invariant");
>       }
>
> This may cause performance perturbations that will need to be 
> investigated - but in theory, as per the comment above, signalling 
> outside the lock should be beneficial in the linux case because there 
> is no wait-morphing (where a signal simply takes a waiting thread and 
> places it into the mutex queue thereby avoiding the need to awaken the 
> thread so it can enqueue itself).
>
> The change itself is of course trivial:
>
> http://cr.openjdk.java.net/~dholmes/8130728/webrev/
>
> I'd appreciate any comments/concerns particularly from the BSD and AIX 
> folk who acquired this unnecessary workaround.
>
> If deemed necessary we could add a flag that controls whether the 
> signal happens with or without the lock held.
>
> Thanks,
> David



More information about the porters-dev mailing list