RFR: JDK-6546236 Thread interrupt() of Thread.sleep() can be lost on Solaris due to race with signal handler

David Holmes david.holmes at oracle.com
Thu Feb 13 18:25:45 PST 2014


Hi Fred,

Sorry for the delay. This looks longer than it actually is - mostly 
discussion. :)

Dan has already picked up most of the issues I saw. But to expand a little:

You took (mostly) the Linux versions of os::sleep, os::interrupt and 
os::is_interrupted and moved them to os_posix.cpp, deleting them from 
linux/bsd/solaris. Ok.

---

But you carried into os::interrupted a comment block from Solaris which
is somewhat dated/unclear and has some Solaris specific wording. I think 
it can just be edited slightly to clarify things:

// NOTE that since there is no "lock" around the interrupt and
// is_interrupted operations, there is the possibility that the
// interrupted flag (in osThread) will be "false" but that the
// low-level events will be in the signalled state. This is
// intentional. The effect of this is that Object.wait() and
// LockSupport.park() will appear to have a spurious wakeup, which
// is allowed and not harmful, and the possibility is so rare that
// it is not worth the added complexity to add yet another lock.
// For the sleep event an explicit reset is performed on entry
// to os::sleep, so there is no early return. It has also been
// recommended not to put the interrupted flag into the "event"
// structure because it hides the issue.

I think it worth keeping this comment given the Windows version does 
actually have a bad data race due to this non-atomic mode of operation.

---

In os::sleep a couple of minor nits:
- can you expand the warning to say "time moving backwards detected in 
os::sleep" so that if someone puts this in a bug report we know where it 
comes from
- you inherited a couple of style-nits from the linux version: 
"if(millis)" should be "if (millis)" in two places. Thanks.

---

You also added os::naked_sleep to os_posix.cpp but naked_sleep was 
changed to be naked_short_sleep by 8028280. So you need to resync and 
then ignore naked_short_sleep (it is different on different *NIX platforms).

---

As Dan pointed out Solaris::interrupt called 
RuntimeService::record_thread_interrupt_signaled_count, which seems to a 
last vestige of InterruptibleIO code. But that can be cleaned up as part 
of the impending InterruptibleIO removal in 9.

---

Thanks,
David

On 14/02/2014 4:14 AM, Daniel D. Daugherty wrote:
>  > http://cr.openjdk.java.net/~fparain/6546236/webrev.00/
>
> Sorry for the delay. This was a difficult review to perform due to
> the code motion between files. My technique was to use the 'udiff'
> view for a side-by-side visual compare. Of course, this means I
> could easily have missed critical differences... :-)
>
> src/os/bsd/vm/os_bsd.cpp
>      Deletes os::sleep(), os::naked_sleep(), os::interrupt() and
>      os::is_interrupted().
>
>      bsd << os::sleep >> posix
>        << assert(!Bsd::supports_monotonic_clock(), "time moving
> backwards");
>        >> warning("time moving backwards");
>            The above diff happens twice. Without at least an assert here
>            we will be less likely to see this issue pop-up on specific
>            test machines.
>
>      bsd << os::is_interrupted >> posix
>        picked up a long comment from the Solaris version
>
> src/os/linux/vm/os_linux.cpp
>      Deletes os::sleep(), os::naked_sleep(), os::interrupt() and
>      os::is_interrupted().
>
>      linux << os::sleep >> posix
>        << assert(!Linux::supports_monotonic_clock(), "time moving
> backwards");
>        >> warning("time moving backwards");
>            The above diff happens twice. Without at least an assert here
>            we will be less likely to see this issue pop-up on specific
>            test machines.
>
>      linux << os::is_interrupted >> posix
>        picked up a long comment from the Solaris version
>
> src/os/solaris/vm/os_solaris.cpp
>      Deletes os::sleep(), os::naked_sleep(), os::interrupt() and
>      os::is_interrupted(). Also deletes static os_sleep() and
>      os::Solaris::setup_interruptible_already_blocked()
>
>      solaris << os::interrupt >> posix
>        << // When events are used everywhere for os::sleep, then this
> thr_kill
>        << // will only be needed if UseVMInterruptibleIO is true.
>        << if (!isInterrupted) {
>        <<   int status = thr_kill(osthread->thread_id(),
> os::Solaris::SIGinterrupt());
>        <<   assert_status(status == 0, status, "thr_kill");
>            Deleting the above code presumes that you're also going to
>            remove the remaining uses of UseVMInterruptibleIO. Should
>            that be done with this changeset so things are not out of
>            sync.
>
>        << // Bump thread interruption counter
>        << RuntimeService::record_thread_interrupt_signaled_count();
>            This call was only ever made on Solaris. Is this just a
>            counter for when a SIGinterrupt has to be thrown because
>            of the UseVMInterruptibleIO stuff?
>
>      Doesn't this deletion make INTERRUPTIBLE_NORESTART_VM_ALWAYS
>      unused so it can be deleted from
>      src/os/solaris/vm/os_solaris.inline.hpp
>
>      Were you also planning on removing the last vestiges of
>      UseVMInterruptibleIO? Or will that be done with another bug?
>
> src/os/posix/vm/os_posix.cpp
>      Adds os::sleep(), os::naked_sleep(), os::interrupt() and
>      os::is_interrupted().
>
>      > // the interrupted flag into the os::Solaris::Event structure,
>          Should this be more generic instead of "Solaris" specific?
>
> Dan
>
>
> On 2/10/14 8:58 AM, Frederic Parain wrote:
>> Greetings,
>>
>> Please review this fix for JDK-6546236.
>>
>> A race condition exists in the implementation of
>> Thread.sleep() and Thread.interrupt() on the
>> Solaris platform. For historical reasons, the
>> Solaris implementation of sleep() and interrupt()
>> differs from the other Unices (Linux, BSD).
>> It uses poll() calls while Linux and BSD are
>> using ParkEvents.
>>
>> This changeset reconciles the Solaris code with
>> Linux and BSD code. And instead of having three
>> copies of this code, sleep() and interrupt()
>> code are moved to the os_posix.cpp from
>> os_[solaris|linux|bsd].cpp files.
>>
>> Note that with this changeset, the Solaris
>> implementation of sleep() and interrupt()
>> is loosing the support for UseVMInterruptibleIO
>> but this is not an issue because this VM
>> flag has been deprecated in JDK8 and will be
>> removed in JDK9.
>>
>> The code has been tested with JPRT, vm.quick.testlist
>> as well as jdk_lang, jdk_util, jdk_nio and jdk_awt
>> (not a random list, all those jdk_* test suites are
>> using Thread.sleep() and thread.interrupt()).
>>
>> The bug description:
>> https://bugs.openjdk.java.net/browse/JDK-6546236
>>
>> The webrev:
>> http://cr.openjdk.java.net/~fparain/6546236/webrev.00/
>>
>> Thank you,
>>
>> Fred
>>
>


More information about the hotspot-runtime-dev mailing list