RFR: 8130728: Disable WorkAroundNPTLTimedWaitHang by default
David Holmes
david.holmes at oracle.com
Mon Jul 13 02:56:14 UTC 2015
Thanks Dan.
As there are no other opinions/concerns with this I will push it.
David
On 9/07/2015 11:32 PM, Daniel D. Daugherty wrote:
> > http://cr.openjdk.java.net/~dholmes/8130728/webrev/
>
> src/share/vm/runtime/globals.hpp
> No comments.
>
> Thumbs up.
>
> Time to see what shakes out with this option turned off.
>
> The BSD/MacOS X blame belongs to me. Our thinking at the time was
> to keep the "bsd" files matching the "linux" version unless we
> had a specific reason for changing the code. Of course, we thought
> we would come back later and clean things up... which just didn't
> happen... I don't know the AIX history.
>
> Dan
>
>
> On 7/9/15 3:27 AM, David Holmes wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8130728
>>
>> The workaround triggered by WorkAroundNPTLTimedWaitHang (an
>> experimental flag set to 1 by default) was to alleviate a hang that
>> could occur on a pthread_cond_timedwait on Linux, if the timeout value
>> was a time in the past - see JDK-6314875. This glibc bug was fixed in
>> 2005 in glibc 2.3.5-3
>>
>> https://lists.debian.org/debian-glibc/2005/08/msg00201.html
>>
>> but the workaround persists and was (unfortunately) copied into the
>> BSD and AIX ports.
>>
>> It is time to remove that workaround but before we can do that we need
>> to be sure that we are not in fact hitting the workaround code. To
>> that end I propose to use this bug to switch the flag's value to 0 to
>> verify correct operation.
>>
>> If that goes well then we can remove the code either later in JDK 9 or
>> early in JDK 10.
>>
>> To be clear the flag impacts both the wait and the signal part of the
>> condvar usage (park and unpark). On the wait side we have:
>>
>> status = pthread_cond_timedwait(_cond, _mutex, &abst);
>> if (status != 0 && WorkAroundNPTLTimedWaitHang) {
>> pthread_cond_destroy(_cond);
>> pthread_cond_init(_cond, os::Linux::condAttr());
>> }
>>
>> so we will now skip this potential recovery code.
>>
>> On the signal side we signal while holding the mutex if the workaround
>> is enabled, and we signal after dropping the mutex otherwise. So this
>> code will now be using a new path. Here is the code in
>> PlatformEvent::unpark
>>
>> assert(AnyWaiters == 0 || AnyWaiters == 1, "invariant");
>> if (AnyWaiters != 0 && WorkAroundNPTLTimedWaitHang) {
>> AnyWaiters = 0;
>> pthread_cond_signal(_cond);
>> }
>> status = pthread_mutex_unlock(_mutex);
>> assert_status(status == 0, status, "mutex_unlock");
>> if (AnyWaiters != 0) {
>> // Note that we signal() *after* dropping the lock for "immortal"
>> Events.
>> // This is safe and avoids a common class of futile wakeups. In rare
>> // circumstances this can cause a thread to return prematurely from
>> // cond_{timed}wait() but the spurious wakeup is benign and the
>> victim
>> // will simply re-test the condition and re-park itself.
>> // This provides particular benefit if the underlying platform
>> does not
>> // provide wait morphing.
>> status = pthread_cond_signal(_cond);
>> assert_status(status == 0, status, "cond_signal");
>> }
>>
>> and similarly in Parker::unpark
>>
>> if (WorkAroundNPTLTimedWaitHang) {
>> status = pthread_cond_signal(&_cond[_cur_index]);
>> assert(status == 0, "invariant");
>> status = pthread_mutex_unlock(_mutex);
>> assert(status == 0, "invariant");
>> } else {
>> status = pthread_mutex_unlock(_mutex);
>> assert(status == 0, "invariant");
>> status = pthread_cond_signal(&_cond[_cur_index]);
>> assert(status == 0, "invariant");
>> }
>>
>> This may cause performance perturbations that will need to be
>> investigated - but in theory, as per the comment above, signalling
>> outside the lock should be beneficial in the linux case because there
>> is no wait-morphing (where a signal simply takes a waiting thread and
>> places it into the mutex queue thereby avoiding the need to awaken the
>> thread so it can enqueue itself).
>>
>> The change itself is of course trivial:
>>
>> http://cr.openjdk.java.net/~dholmes/8130728/webrev/
>>
>> I'd appreciate any comments/concerns particularly from the BSD and AIX
>> folk who acquired this unnecessary workaround.
>>
>> If deemed necessary we could add a flag that controls whether the
>> signal happens with or without the lock held.
>>
>> Thanks,
>> David
>
More information about the porters-dev
mailing list