RFR: JDK-6546236 Thread interrupt() of Thread.sleep() can be lost on Solaris due to race with signal handler

Frederic Parain frederic.parain at oracle.com
Wed Feb 19 08:06:29 PST 2014


David,

Thank you for your review,

Fred

On 02/18/2014 09:07 PM, David Holmes wrote:
> Hi Fred,
>
> No further comments. But as soon as this hits the AIX port it will break
> the AIX build so might be an idea to work with compiler team to minimize
> the damage. I'm cc'ing Goetz as a heads up.
>
> Thanks,
> David
>
> On 19/02/2014 1:45 AM, frederic parain wrote:
>> Hi David,
>>
>> Thank you for this second review, my answers are in-lined below.
>>
>> On 18/02/2014 06:56, David Holmes wrote:
>>> Hi Fred,
>>>
>>> On 18/02/2014 4:52 AM, Daniel D. Daugherty wrote:
>>>>  > http://cr.openjdk.java.net/~fparain/6546236/webrev.01/
>>>>
>>>> src/share/vm/runtime/os.hpp
>>>>      No comments.
>>>>
>>>> src/os/bsd/vm/os_bsd.inline.hpp
>>>>      If you move the Bsd::_clock_gettime check here, you should be
>>>>      able to delete Bsd::supports_monotonic_clock from
>>>>      src/os/bsd/vm/os_bsd.hpp.
>>>
>>> Did this change already occur? I don't see os_bsd.hpp in the webrev.
>>
>> This change is in the latest webrev:
>>
>> http://cr.openjdk.java.net/~fparain/6546236/webrev.02/
>>
>>> I think the supports_monotonic_clock change is unnecessary and adding
>>> complexity. This will impact the merge of the AIX port. I can understand
>>> the desire to maintain an assert to detect problematic test machines,
>>> but really all test machines should be using monotonic clock sources, so
>>> the assert could be unconditional.
>>
>> I don't see the complexity of a single method returning a boolean.
>> The implementation can be very simple, returning false will just
>> disable the assert for AIX. If some platforms use this method on
>> other places (Linux, BSB), some platforms don't (Solaris, Windows).
>>
>> This method preserves the behavior the JVM had before the
>> refactoring.
>>
>>> Further if the assert is maintained
>>> then the warning is not needed - it was only suggested as an alternative
>>> to the assert.
>>
>> I removed the warning and kept the assert as it was in the initial code.
>>
>> Regards,
>>
>> Fred
>>
>>>> src/os/bsd/vm/os_bsd.cpp
>>>>      No comments.
>>>>
>>>> src/os/linux/vm/os_linux.hpp
>>>>      No comments.
>>>>
>>>> src/os/linux/vm/os_linux.inline.hpp
>>>>      No comments.
>>>>
>>>> src/os/linux/vm/os_linux.cpp
>>>>      No comments.
>>>>
>>>> src/os/solaris/vm/os_solaris.inline.hpp
>>>>      No comments.
>>>>
>>>> src/os/solaris/vm/os_solaris.cpp
>>>>      No comments.
>>>>
>>>> src/os/windows/vm/os_windows.hpp
>>>>      No comments.
>>>>
>>>> src/os/windows/vm/os_windows.inline.hpp
>>>>      No comments.
>>>>
>>>> src/os/windows/vm/os_windows.cpp
>>>>      No comments.
>>>>
>>>> src/os/posix/vm/os_posix.cpp
>>>>      line 333: assert(true, "unexpected time moving backwards detected
>>>> in os:sleep()");
>>>>      line 375: assert(true, "unexpected time moving backwards detected
>>>> on os::sleep()");
>>>>          These asserts will not fire. The 'true' needs to be 'false'.
>>>>
>>>>      You might want to distinguish between the two different pairs of
>>>>      messages:
>>>>
>>>>   332           warning("time moving backwards detected in
>>>> os:sleep(interruptible)");
>>>>   333           assert(false, "unexpected time moving backwards
>>>> detected
>>>> in os:sleep(interruptible)");
>>>>
>>>>      and:
>>>>
>>>>   374           warning("time moving backwards detected on
>>>> os::sleep(!interruptible)");
>>>>   375           assert(false, "unexpected time moving backwards
>>>> detected
>>>> on os::sleep(!interruptible)");
>>>
>>> Also note the os:sleep vs os::sleep typo.
>>>
>>> Cheers,
>>> David
>>>
>>>> Dan
>>>>
>>>>
>>>> On 2/17/14 9:17 AM, Frederic Parain wrote:
>>>>> Hi Dan,
>>>>>
>>>>> Thank you very much for your review, I knew it was a painful review
>>>>> with all this code moving from one directory to another. I've
>>>>> in-lined my answers below.
>>>>>
>>>>> The new webrev is here:
>>>>> http://cr.openjdk.java.net/~fparain/6546236/webrev.01/
>>>>>
>>>>> On 02/13/2014 07:14 PM, Daniel D. Daugherty wrote:
>>>>>>  > http://cr.openjdk.java.net/~fparain/6546236/webrev.00/
>>>>>>
>>>>>> Sorry for the delay. This was a difficult review to perform due to
>>>>>> the code motion between files. My technique was to use the 'udiff'
>>>>>> view for a side-by-side visual compare. Of course, this means I
>>>>>> could easily have missed critical differences... :-)
>>>>>>
>>>>>> src/os/bsd/vm/os_bsd.cpp
>>>>>>      Deletes os::sleep(), os::naked_sleep(), os::interrupt() and
>>>>>>      os::is_interrupted().
>>>>>>
>>>>>>      bsd << os::sleep >> posix
>>>>>>        << assert(!Bsd::supports_monotonic_clock(), "time moving
>>>>>> backwards");
>>>>>>        >> warning("time moving backwards");
>>>>>>            The above diff happens twice. Without at least an assert
>>>>>> here
>>>>>>            we will be less likely to see this issue pop-up on
>>>>>> specific
>>>>>>            test machines.
>>>>>
>>>>> I've re-written this warning. Now the warning will be printed only
>>>>> if the JVM was expecting a monotonic clock and its time warps
>>>>> backward.
>>>>> Note that warnings can be disabled on product builds with
>>>>> -XX:-PrintWarnings if the warning is causing trouble on VM output
>>>>> stream.
>>>>> The warning is also associated with an assert for our debug builds,
>>>>> as we discussed privately.
>>>>>
>>>>>>      bsd << os::is_interrupted >> posix
>>>>>>        picked up a long comment from the Solaris version
>>>>>>
>>>>>> src/os/linux/vm/os_linux.cpp
>>>>>>      Deletes os::sleep(), os::naked_sleep(), os::interrupt() and
>>>>>>      os::is_interrupted().
>>>>>>
>>>>>>      linux << os::sleep >> posix
>>>>>>        << assert(!Linux::supports_monotonic_clock(), "time moving
>>>>>> backwards");
>>>>>>        >> warning("time moving backwards");
>>>>>>            The above diff happens twice. Without at least an assert
>>>>>> here
>>>>>>            we will be less likely to see this issue pop-up on
>>>>>> specific
>>>>>>            test machines.
>>>>>
>>>>> Same comment as above.
>>>>>
>>>>>>      linux << os::is_interrupted >> posix
>>>>>>        picked up a long comment from the Solaris version
>>>>>>
>>>>>> src/os/solaris/vm/os_solaris.cpp
>>>>>>      Deletes os::sleep(), os::naked_sleep(), os::interrupt() and
>>>>>>      os::is_interrupted(). Also deletes static os_sleep() and
>>>>>>      os::Solaris::setup_interruptible_already_blocked()
>>>>>>
>>>>>>      solaris << os::interrupt >> posix
>>>>>>        << // When events are used everywhere for os::sleep, then this
>>>>>> thr_kill
>>>>>>        << // will only be needed if UseVMInterruptibleIO is true.
>>>>>>        << if (!isInterrupted) {
>>>>>>        <<   int status = thr_kill(osthread->thread_id(),
>>>>>> os::Solaris::SIGinterrupt());
>>>>>>        <<   assert_status(status == 0, status, "thr_kill");
>>>>>>            Deleting the above code presumes that you're also going to
>>>>>>            remove the remaining uses of UseVMInterruptibleIO. Should
>>>>>>            that be done with this changeset so things are not out of
>>>>>>            sync.
>>>>>
>>>>> Initially I had the fix for 6546236 and the removal of
>>>>> UseVMInterruptibleIO in the same workspace, but I split it into
>>>>> two different changesets to ease tracking in our bug database.
>>>>> The changeset to remove UseVMInterruptibleIO should be ready very
>>>>> soon after fix for 6546236 has been pushed.
>>>>>
>>>>>>        << // Bump thread interruption counter
>>>>>>        << RuntimeService::record_thread_interrupt_signaled_count();
>>>>>>            This call was only ever made on Solaris. Is this just a
>>>>>>            counter for when a SIGinterrupt has to be thrown because
>>>>>>            of the UseVMInterruptibleIO stuff?
>>>>>>
>>>>>>      Doesn't this deletion make INTERRUPTIBLE_NORESTART_VM_ALWAYS
>>>>>>      unused so it can be deleted from
>>>>>>      src/os/solaris/vm/os_solaris.inline.hpp
>>>>>>
>>>>>>      Were you also planning on removing the last vestiges of
>>>>>>      UseVMInterruptibleIO? Or will that be done with another bug?
>>>>>
>>>>> The changeset to remove UseVMInterruptibleIO includes all those
>>>>> cleanup:
>>>>> clean deprecation of the flag, re-work of all the macros around
>>>>> interruptible calls, cleanup of RuntimeServices methods and the
>>>>> associated performance counters.
>>>>>
>>>>>> src/os/posix/vm/os_posix.cpp
>>>>>>      Adds os::sleep(), os::naked_sleep(), os::interrupt() and
>>>>>>      os::is_interrupted().
>>>>>>
>>>>>>      > // the interrupted flag into the os::Solaris::Event structure,
>>>>>>          Should this be more generic instead of "Solaris" specific?
>>>>>
>>>>> I replaced this comment by the new version provided by David Holmes,
>>>>> so no more Solaric specific content.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Fred
>>>>>
>>>>>> On 2/10/14 8:58 AM, Frederic Parain wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> Please review this fix for JDK-6546236.
>>>>>>>
>>>>>>> A race condition exists in the implementation of
>>>>>>> Thread.sleep() and Thread.interrupt() on the
>>>>>>> Solaris platform. For historical reasons, the
>>>>>>> Solaris implementation of sleep() and interrupt()
>>>>>>> differs from the other Unices (Linux, BSD).
>>>>>>> It uses poll() calls while Linux and BSD are
>>>>>>> using ParkEvents.
>>>>>>>
>>>>>>> This changeset reconciles the Solaris code with
>>>>>>> Linux and BSD code. And instead of having three
>>>>>>> copies of this code, sleep() and interrupt()
>>>>>>> code are moved to the os_posix.cpp from
>>>>>>> os_[solaris|linux|bsd].cpp files.
>>>>>>>
>>>>>>> Note that with this changeset, the Solaris
>>>>>>> implementation of sleep() and interrupt()
>>>>>>> is loosing the support for UseVMInterruptibleIO
>>>>>>> but this is not an issue because this VM
>>>>>>> flag has been deprecated in JDK8 and will be
>>>>>>> removed in JDK9.
>>>>>>>
>>>>>>> The code has been tested with JPRT, vm.quick.testlist
>>>>>>> as well as jdk_lang, jdk_util, jdk_nio and jdk_awt
>>>>>>> (not a random list, all those jdk_* test suites are
>>>>>>> using Thread.sleep() and thread.interrupt()).
>>>>>>>
>>>>>>> The bug description:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6546236
>>>>>>>
>>>>>>> The webrev:
>>>>>>> http://cr.openjdk.java.net/~fparain/6546236/webrev.00/
>>>>>>>
>>>>>>> Thank you,
>>>>>>>
>>>>>>> Fred
>>>>>>>
>>>>>>
>>>>>
>>>>
>>

-- 
Frederic Parain - Oracle
Grenoble Engineering Center - France
Phone: +33 4 76 18 81 17
Email: Frederic.Parain at oracle.com


More information about the hotspot-runtime-dev mailing list