RFR: JDK-6546236 Thread interrupt() of Thread.sleep() can be lost on Solaris due to race with signal handler

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Tue Feb 18 12:43:04 PST 2014


Hi,

thanks for the hint, David!  Yes, we will need to adapt 
that.  I'll keep it in mind.

Thanks,
  Goetz.

-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Tuesday, February 18, 2014 9:08 PM
To: frederic parain
Cc: daniel.daugherty at oracle.com; hotspot-runtime-dev; Lindenmaier, Goetz
Subject: Re: RFR: JDK-6546236 Thread interrupt() of Thread.sleep() can be lost on Solaris due to race with signal handler

Hi Fred,

No further comments. But as soon as this hits the AIX port it will break 
the AIX build so might be an idea to work with compiler team to minimize 
the damage. I'm cc'ing Goetz as a heads up.

Thanks,
David

On 19/02/2014 1:45 AM, frederic parain wrote:
> Hi David,
>
> Thank you for this second review, my answers are in-lined below.
>
> On 18/02/2014 06:56, David Holmes wrote:
>> Hi Fred,
>>
>> On 18/02/2014 4:52 AM, Daniel D. Daugherty wrote:
>>>  > http://cr.openjdk.java.net/~fparain/6546236/webrev.01/
>>>
>>> src/share/vm/runtime/os.hpp
>>>      No comments.
>>>
>>> src/os/bsd/vm/os_bsd.inline.hpp
>>>      If you move the Bsd::_clock_gettime check here, you should be
>>>      able to delete Bsd::supports_monotonic_clock from
>>>      src/os/bsd/vm/os_bsd.hpp.
>>
>> Did this change already occur? I don't see os_bsd.hpp in the webrev.
>
> This change is in the latest webrev:
>
> http://cr.openjdk.java.net/~fparain/6546236/webrev.02/
>
>> I think the supports_monotonic_clock change is unnecessary and adding
>> complexity. This will impact the merge of the AIX port. I can understand
>> the desire to maintain an assert to detect problematic test machines,
>> but really all test machines should be using monotonic clock sources, so
>> the assert could be unconditional.
>
> I don't see the complexity of a single method returning a boolean.
> The implementation can be very simple, returning false will just
> disable the assert for AIX. If some platforms use this method on
> other places (Linux, BSB), some platforms don't (Solaris, Windows).
>
> This method preserves the behavior the JVM had before the
> refactoring.
>
>> Further if the assert is maintained
>> then the warning is not needed - it was only suggested as an alternative
>> to the assert.
>
> I removed the warning and kept the assert as it was in the initial code.
>
> Regards,
>
> Fred
>
>>> src/os/bsd/vm/os_bsd.cpp
>>>      No comments.
>>>
>>> src/os/linux/vm/os_linux.hpp
>>>      No comments.
>>>
>>> src/os/linux/vm/os_linux.inline.hpp
>>>      No comments.
>>>
>>> src/os/linux/vm/os_linux.cpp
>>>      No comments.
>>>
>>> src/os/solaris/vm/os_solaris.inline.hpp
>>>      No comments.
>>>
>>> src/os/solaris/vm/os_solaris.cpp
>>>      No comments.
>>>
>>> src/os/windows/vm/os_windows.hpp
>>>      No comments.
>>>
>>> src/os/windows/vm/os_windows.inline.hpp
>>>      No comments.
>>>
>>> src/os/windows/vm/os_windows.cpp
>>>      No comments.
>>>
>>> src/os/posix/vm/os_posix.cpp
>>>      line 333: assert(true, "unexpected time moving backwards detected
>>> in os:sleep()");
>>>      line 375: assert(true, "unexpected time moving backwards detected
>>> on os::sleep()");
>>>          These asserts will not fire. The 'true' needs to be 'false'.
>>>
>>>      You might want to distinguish between the two different pairs of
>>>      messages:
>>>
>>>   332           warning("time moving backwards detected in
>>> os:sleep(interruptible)");
>>>   333           assert(false, "unexpected time moving backwards detected
>>> in os:sleep(interruptible)");
>>>
>>>      and:
>>>
>>>   374           warning("time moving backwards detected on
>>> os::sleep(!interruptible)");
>>>   375           assert(false, "unexpected time moving backwards detected
>>> on os::sleep(!interruptible)");
>>
>> Also note the os:sleep vs os::sleep typo.
>>
>> Cheers,
>> David
>>
>>> Dan
>>>
>>>
>>> On 2/17/14 9:17 AM, Frederic Parain wrote:
>>>> Hi Dan,
>>>>
>>>> Thank you very much for your review, I knew it was a painful review
>>>> with all this code moving from one directory to another. I've
>>>> in-lined my answers below.
>>>>
>>>> The new webrev is here:
>>>> http://cr.openjdk.java.net/~fparain/6546236/webrev.01/
>>>>
>>>> On 02/13/2014 07:14 PM, Daniel D. Daugherty wrote:
>>>>>  > http://cr.openjdk.java.net/~fparain/6546236/webrev.00/
>>>>>
>>>>> Sorry for the delay. This was a difficult review to perform due to
>>>>> the code motion between files. My technique was to use the 'udiff'
>>>>> view for a side-by-side visual compare. Of course, this means I
>>>>> could easily have missed critical differences... :-)
>>>>>
>>>>> src/os/bsd/vm/os_bsd.cpp
>>>>>      Deletes os::sleep(), os::naked_sleep(), os::interrupt() and
>>>>>      os::is_interrupted().
>>>>>
>>>>>      bsd << os::sleep >> posix
>>>>>        << assert(!Bsd::supports_monotonic_clock(), "time moving
>>>>> backwards");
>>>>>        >> warning("time moving backwards");
>>>>>            The above diff happens twice. Without at least an assert
>>>>> here
>>>>>            we will be less likely to see this issue pop-up on specific
>>>>>            test machines.
>>>>
>>>> I've re-written this warning. Now the warning will be printed only
>>>> if the JVM was expecting a monotonic clock and its time warps backward.
>>>> Note that warnings can be disabled on product builds with
>>>> -XX:-PrintWarnings if the warning is causing trouble on VM output
>>>> stream.
>>>> The warning is also associated with an assert for our debug builds,
>>>> as we discussed privately.
>>>>
>>>>>      bsd << os::is_interrupted >> posix
>>>>>        picked up a long comment from the Solaris version
>>>>>
>>>>> src/os/linux/vm/os_linux.cpp
>>>>>      Deletes os::sleep(), os::naked_sleep(), os::interrupt() and
>>>>>      os::is_interrupted().
>>>>>
>>>>>      linux << os::sleep >> posix
>>>>>        << assert(!Linux::supports_monotonic_clock(), "time moving
>>>>> backwards");
>>>>>        >> warning("time moving backwards");
>>>>>            The above diff happens twice. Without at least an assert
>>>>> here
>>>>>            we will be less likely to see this issue pop-up on specific
>>>>>            test machines.
>>>>
>>>> Same comment as above.
>>>>
>>>>>      linux << os::is_interrupted >> posix
>>>>>        picked up a long comment from the Solaris version
>>>>>
>>>>> src/os/solaris/vm/os_solaris.cpp
>>>>>      Deletes os::sleep(), os::naked_sleep(), os::interrupt() and
>>>>>      os::is_interrupted(). Also deletes static os_sleep() and
>>>>>      os::Solaris::setup_interruptible_already_blocked()
>>>>>
>>>>>      solaris << os::interrupt >> posix
>>>>>        << // When events are used everywhere for os::sleep, then this
>>>>> thr_kill
>>>>>        << // will only be needed if UseVMInterruptibleIO is true.
>>>>>        << if (!isInterrupted) {
>>>>>        <<   int status = thr_kill(osthread->thread_id(),
>>>>> os::Solaris::SIGinterrupt());
>>>>>        <<   assert_status(status == 0, status, "thr_kill");
>>>>>            Deleting the above code presumes that you're also going to
>>>>>            remove the remaining uses of UseVMInterruptibleIO. Should
>>>>>            that be done with this changeset so things are not out of
>>>>>            sync.
>>>>
>>>> Initially I had the fix for 6546236 and the removal of
>>>> UseVMInterruptibleIO in the same workspace, but I split it into
>>>> two different changesets to ease tracking in our bug database.
>>>> The changeset to remove UseVMInterruptibleIO should be ready very
>>>> soon after fix for 6546236 has been pushed.
>>>>
>>>>>        << // Bump thread interruption counter
>>>>>        << RuntimeService::record_thread_interrupt_signaled_count();
>>>>>            This call was only ever made on Solaris. Is this just a
>>>>>            counter for when a SIGinterrupt has to be thrown because
>>>>>            of the UseVMInterruptibleIO stuff?
>>>>>
>>>>>      Doesn't this deletion make INTERRUPTIBLE_NORESTART_VM_ALWAYS
>>>>>      unused so it can be deleted from
>>>>>      src/os/solaris/vm/os_solaris.inline.hpp
>>>>>
>>>>>      Were you also planning on removing the last vestiges of
>>>>>      UseVMInterruptibleIO? Or will that be done with another bug?
>>>>
>>>> The changeset to remove UseVMInterruptibleIO includes all those
>>>> cleanup:
>>>> clean deprecation of the flag, re-work of all the macros around
>>>> interruptible calls, cleanup of RuntimeServices methods and the
>>>> associated performance counters.
>>>>
>>>>> src/os/posix/vm/os_posix.cpp
>>>>>      Adds os::sleep(), os::naked_sleep(), os::interrupt() and
>>>>>      os::is_interrupted().
>>>>>
>>>>>      > // the interrupted flag into the os::Solaris::Event structure,
>>>>>          Should this be more generic instead of "Solaris" specific?
>>>>
>>>> I replaced this comment by the new version provided by David Holmes,
>>>> so no more Solaric specific content.
>>>>
>>>> Thank you,
>>>>
>>>> Fred
>>>>
>>>>> On 2/10/14 8:58 AM, Frederic Parain wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> Please review this fix for JDK-6546236.
>>>>>>
>>>>>> A race condition exists in the implementation of
>>>>>> Thread.sleep() and Thread.interrupt() on the
>>>>>> Solaris platform. For historical reasons, the
>>>>>> Solaris implementation of sleep() and interrupt()
>>>>>> differs from the other Unices (Linux, BSD).
>>>>>> It uses poll() calls while Linux and BSD are
>>>>>> using ParkEvents.
>>>>>>
>>>>>> This changeset reconciles the Solaris code with
>>>>>> Linux and BSD code. And instead of having three
>>>>>> copies of this code, sleep() and interrupt()
>>>>>> code are moved to the os_posix.cpp from
>>>>>> os_[solaris|linux|bsd].cpp files.
>>>>>>
>>>>>> Note that with this changeset, the Solaris
>>>>>> implementation of sleep() and interrupt()
>>>>>> is loosing the support for UseVMInterruptibleIO
>>>>>> but this is not an issue because this VM
>>>>>> flag has been deprecated in JDK8 and will be
>>>>>> removed in JDK9.
>>>>>>
>>>>>> The code has been tested with JPRT, vm.quick.testlist
>>>>>> as well as jdk_lang, jdk_util, jdk_nio and jdk_awt
>>>>>> (not a random list, all those jdk_* test suites are
>>>>>> using Thread.sleep() and thread.interrupt()).
>>>>>>
>>>>>> The bug description:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-6546236
>>>>>>
>>>>>> The webrev:
>>>>>> http://cr.openjdk.java.net/~fparain/6546236/webrev.00/
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Fred
>>>>>>
>>>>>
>>>>
>>>
>


More information about the hotspot-runtime-dev mailing list