RFR: JDK-6546236 Thread interrupt() of Thread.sleep() can be lost on Solaris due to race with signal handler

Frederic Parain frederic.parain at oracle.com
Mon Feb 17 08:23:51 PST 2014


Hi David,

Thank you for your review, my answers are in-lined below
and the new webrev is here:
http://cr.openjdk.java.net/~fparain/6546236/webrev.01/

On 02/14/2014 03:25 AM, David Holmes wrote:
> Hi Fred,
>
> Sorry for the delay. This looks longer than it actually is - mostly
> discussion. :)
>
> Dan has already picked up most of the issues I saw. But to expand a little:
>
> You took (mostly) the Linux versions of os::sleep, os::interrupt and
> os::is_interrupted and moved them to os_posix.cpp, deleting them from
> linux/bsd/solaris. Ok.
>
> ---
>
> But you carried into os::interrupted a comment block from Solaris which
> is somewhat dated/unclear and has some Solaris specific wording. I think
> it can just be edited slightly to clarify things:
>
> // NOTE that since there is no "lock" around the interrupt and
> // is_interrupted operations, there is the possibility that the
> // interrupted flag (in osThread) will be "false" but that the
> // low-level events will be in the signalled state. This is
> // intentional. The effect of this is that Object.wait() and
> // LockSupport.park() will appear to have a spurious wakeup, which
> // is allowed and not harmful, and the possibility is so rare that
> // it is not worth the added complexity to add yet another lock.
> // For the sleep event an explicit reset is performed on entry
> // to os::sleep, so there is no early return. It has also been
> // recommended not to put the interrupted flag into the "event"
> // structure because it hides the issue.
>
> I think it worth keeping this comment given the Windows version does
> actually have a bad data race due to this non-atomic mode of operation.

Thank you for your rework of this comment, I put in the code with
only one letter change (a typo).

> ---
>
> In os::sleep a couple of minor nits:
> - can you expand the warning to say "time moving backwards detected in
> os::sleep" so that if someone puts this in a bug report we know where it
> comes from

Fixed. I also conditioned the warning with the fact the JVM was
expecting a monotonic clock. To do that I had to refactor code
to have a supports_monotonic_clock() method declared in the os
class, and implemented in the OS specific files. Those changes
grow the size of the changeset, but there's no complexity in
this code.

> - you inherited a couple of style-nits from the linux version:
> "if(millis)" should be "if (millis)" in two places. Thanks.

Fixed

> ---
>
> You also added os::naked_sleep to os_posix.cpp but naked_sleep was
> changed to be naked_short_sleep by 8028280. So you need to resync and
> then ignore naked_short_sleep (it is different on different *NIX
> platforms).

os::naked_sleep() is gone.

> ---
>
> As Dan pointed out Solaris::interrupt called
> RuntimeService::record_thread_interrupt_signaled_count, which seems to a
> last vestige of InterruptibleIO code. But that can be cleaned up as part
> of the impending InterruptibleIO removal in 9.

Yes, it's part of a large clean up that comes with the removal of
InterruptibleIO in a next changeset.

Thank you,

Fred

> On 14/02/2014 4:14 AM, Daniel D. Daugherty wrote:
>>  > http://cr.openjdk.java.net/~fparain/6546236/webrev.00/
>>
>> Sorry for the delay. This was a difficult review to perform due to
>> the code motion between files. My technique was to use the 'udiff'
>> view for a side-by-side visual compare. Of course, this means I
>> could easily have missed critical differences... :-)
>>
>> src/os/bsd/vm/os_bsd.cpp
>>      Deletes os::sleep(), os::naked_sleep(), os::interrupt() and
>>      os::is_interrupted().
>>
>>      bsd << os::sleep >> posix
>>        << assert(!Bsd::supports_monotonic_clock(), "time moving
>> backwards");
>>        >> warning("time moving backwards");
>>            The above diff happens twice. Without at least an assert here
>>            we will be less likely to see this issue pop-up on specific
>>            test machines.
>>
>>      bsd << os::is_interrupted >> posix
>>        picked up a long comment from the Solaris version
>>
>> src/os/linux/vm/os_linux.cpp
>>      Deletes os::sleep(), os::naked_sleep(), os::interrupt() and
>>      os::is_interrupted().
>>
>>      linux << os::sleep >> posix
>>        << assert(!Linux::supports_monotonic_clock(), "time moving
>> backwards");
>>        >> warning("time moving backwards");
>>            The above diff happens twice. Without at least an assert here
>>            we will be less likely to see this issue pop-up on specific
>>            test machines.
>>
>>      linux << os::is_interrupted >> posix
>>        picked up a long comment from the Solaris version
>>
>> src/os/solaris/vm/os_solaris.cpp
>>      Deletes os::sleep(), os::naked_sleep(), os::interrupt() and
>>      os::is_interrupted(). Also deletes static os_sleep() and
>>      os::Solaris::setup_interruptible_already_blocked()
>>
>>      solaris << os::interrupt >> posix
>>        << // When events are used everywhere for os::sleep, then this
>> thr_kill
>>        << // will only be needed if UseVMInterruptibleIO is true.
>>        << if (!isInterrupted) {
>>        <<   int status = thr_kill(osthread->thread_id(),
>> os::Solaris::SIGinterrupt());
>>        <<   assert_status(status == 0, status, "thr_kill");
>>            Deleting the above code presumes that you're also going to
>>            remove the remaining uses of UseVMInterruptibleIO. Should
>>            that be done with this changeset so things are not out of
>>            sync.
>>
>>        << // Bump thread interruption counter
>>        << RuntimeService::record_thread_interrupt_signaled_count();
>>            This call was only ever made on Solaris. Is this just a
>>            counter for when a SIGinterrupt has to be thrown because
>>            of the UseVMInterruptibleIO stuff?
>>
>>      Doesn't this deletion make INTERRUPTIBLE_NORESTART_VM_ALWAYS
>>      unused so it can be deleted from
>>      src/os/solaris/vm/os_solaris.inline.hpp
>>
>>      Were you also planning on removing the last vestiges of
>>      UseVMInterruptibleIO? Or will that be done with another bug?
>>
>> src/os/posix/vm/os_posix.cpp
>>      Adds os::sleep(), os::naked_sleep(), os::interrupt() and
>>      os::is_interrupted().
>>
>>      > // the interrupted flag into the os::Solaris::Event structure,
>>          Should this be more generic instead of "Solaris" specific?
>>
>> Dan
>>
>>
>> On 2/10/14 8:58 AM, Frederic Parain wrote:
>>> Greetings,
>>>
>>> Please review this fix for JDK-6546236.
>>>
>>> A race condition exists in the implementation of
>>> Thread.sleep() and Thread.interrupt() on the
>>> Solaris platform. For historical reasons, the
>>> Solaris implementation of sleep() and interrupt()
>>> differs from the other Unices (Linux, BSD).
>>> It uses poll() calls while Linux and BSD are
>>> using ParkEvents.
>>>
>>> This changeset reconciles the Solaris code with
>>> Linux and BSD code. And instead of having three
>>> copies of this code, sleep() and interrupt()
>>> code are moved to the os_posix.cpp from
>>> os_[solaris|linux|bsd].cpp files.
>>>
>>> Note that with this changeset, the Solaris
>>> implementation of sleep() and interrupt()
>>> is loosing the support for UseVMInterruptibleIO
>>> but this is not an issue because this VM
>>> flag has been deprecated in JDK8 and will be
>>> removed in JDK9.
>>>
>>> The code has been tested with JPRT, vm.quick.testlist
>>> as well as jdk_lang, jdk_util, jdk_nio and jdk_awt
>>> (not a random list, all those jdk_* test suites are
>>> using Thread.sleep() and thread.interrupt()).
>>>
>>> The bug description:
>>> https://bugs.openjdk.java.net/browse/JDK-6546236
>>>
>>> The webrev:
>>> http://cr.openjdk.java.net/~fparain/6546236/webrev.00/
>>>
>>> Thank you,
>>>
>>> Fred
>>>
>>

-- 
Frederic Parain - Oracle
Grenoble Engineering Center - France
Phone: +33 4 76 18 81 17
Email: Frederic.Parain at oracle.com


More information about the hotspot-runtime-dev mailing list