RFR: 6313903: Thread.sleep(3) might wake up immediately on windows
Daniel D. Daugherty
daniel.daugherty at oracle.com
Wed Sep 4 14:15:41 UTC 2019
I vote for (mostly) relocation with this bug and improvements
in a new bug...
Dan
On 9/3/19 11:38 PM, David Holmes wrote:
> Hi Roger,
>
> Thanks for clarifying - yes that change would be okay. I may look at
> making it further down the track with these changes so that we have
> minimal disruption for now.
>
> Thanks again.
>
> David
>
> On 4/09/2019 1:29 pm, Roger Riggs wrote:
>> Hi David,
>>
>> No argument this code is sensitive; I generally agree to minimize
>> changes.
>>
>> Here's what I had in mind: (Not compiled)
>> It assumes that before the first park(millis) there will be no
>> significant elapsed time that would reduce millis.
>> If that's not valid then using the original code will take that into
>> account.
>>
>> jlong prevtime = javaTimeNanos();
>>
>> assert(thread->is_Java_thread(), "sanity check");
>> JavaThread *jt = (JavaThread *) thread;
>>
>> for (;;) {
>>
>> // interruption has precedence over timing out
>> if (os::is_interrupted(thread, true)) {
>> return OS_INTRPT;
>> }
>>
>> if (millis <= 0) {
>> return OS_OK;
>> }
>>
>> {
>> ThreadBlockInVM tbivm(jt);
>> OSThreadWaitState osts(jt->osthread(), false /* not
>> Object.wait() */);
>>
>> jt->set_suspend_equivalent();
>> // cleared by handle_special_suspend_equivalent_condition() or
>> // java_suspend_self() via check_and_wait_while_suspended()
>>
>> slp->park(millis);
>>
>> // were we externally suspended while we were waiting?
>> jt->check_and_wait_while_suspended();
>> // wait complete, compute remaining wait time
>> jlong newtime = javaTimeNanos();
>>
>> if (newtime - prevtime < 0) {
>> // time moving backwards, should only happen if no
>> monotonic clock
>> // not a guarantee() because JVM should not abort on
>> kernel/glibc bugs
>> assert(!os::supports_monotonic_clock(),
>> "unexpected time moving backwards detected in
>> os::sleep(interruptible)");
>> } else {
>> millis -= (newtime - prevtime) / NANOSECS_PER_MILLISEC;
>> }
>> prevtime = newtime;
>> }
>> }
>> return OS_OK;
>>
>> I don't feel strongly about this, if you prefer your original; that's
>> fine.
>>
>> $.02, Roger
>>
>>
>> On 9/3/19 7:20 PM, David Holmes wrote:
>>> Hi Roger,
>>>
>>> Thanks for taking a look.
>>>
>>> On 4/09/2019 12:30 am, Roger Riggs wrote:
>>>> Hi David,
>>>>
>>>> In os.cpp:1893, can the double read of javaTimeNanos be avoided on
>>>> the first iteration?
>>>
>>> Note this is the existing code and I'm deliberately trying not to
>>> change anything in that regard.
>>>
>>>> Set newtime = prevtime before the loop and re-read it after the
>>>> call to park(:1918)
>>>
>>> I can't quite see the full change you propose here. The code has to
>>> read the new time, check for timeout, update prevtime - in that
>>> order, before doing the actual wait. If all that is moved after the
>>> wait then we aren't giving interrupt precedence. If we move the
>>> interrupt check after the wait as well then we need to add an
>>> additional interrupt check before the loop for the first time
>>> through. (I actually had a version that did this and decided it was
>>> too disruptive as part of this code relocation.)
>>>
>>>> Negible nanotime should have elapsed between 1881 and 1893.
>>>> Ditto the non-interruptable code in 1936-1947.
>>>
>>> Note the non-interuptible code will disappear in my next RFE.
>>>
>>>> (Unless javaTimeNanos is free?).
>>>
>>> Afraid not.
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> $.02, Roger
>>>>
>>>>
>>>>
>>>>
>>>> On 9/2/19 2:06 AM, David Holmes wrote:
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-6313903
>>>>> webrev: http://cr.openjdk.java.net/~dholmes/6313903/webrev/
>>>>>
>>>>> This is a fix to a very long term problem on Windows that really
>>>>> should have been fixed many years ago: the Windows timed-wait
>>>>> routines can return early by up to a clock tick** (which itself
>>>>> could be as long as 16ms). The fix for Thread.sleep is to track
>>>>> the elapsed time (via nanoTime) and simply adjust and repeat the
>>>>> sleep if we were going to return earlier than requested. This is
>>>>> the same as what we have been doing on non-Windows for many years.
>>>>>
>>>>> ** whether or not you observe this seems highly dependent on
>>>>> hardware - see bug report for observed behaviours.
>>>>>
>>>>> The additional motivation for fixing this now is to reduce the
>>>>> amount of platform specific sleep/interrupt code and to allow
>>>>> changes to the shared interrupt code. There are some additional
>>>>> RFE's in the pipeline for that.
>>>>>
>>>>> This fix does the following:
>>>>> - moves the os_posix.cpp os::sleep implementation to os.cpp to be
>>>>> shared by all platforms and deletes the os_windows.cpp os::sleep
>>>>> implementation
>>>>> - there are some additional comments added and the JavaThread
>>>>> assertion is moved to the head of the "interruptible" code path
>>>>> - updates os_windows.cpp in these additional ways:
>>>>> - updates the commentary about creating the interrupt_event
>>>>> (which is now only used by JDK library code for use by
>>>>> Process.waitFor)
>>>>> - updates os::interrupt to also unpark the _SleepEvent
>>>>> - moves the use of the HighResolutionInterval utility class from
>>>>> the os::sleep code to the PlatformEvent::park code, so that sleep
>>>>> on Windows will still make adjustments to the clock tick if needed
>>>>> (it also means that Object.wait() calls will now make the same
>>>>> adjustments**).
>>>>>
>>>>> ** This is a change behaviour but I'm not concerned about there
>>>>> being any adverse effects from this. On many systems the code will
>>>>> have no affect as the system will be using the 1ms clock tick all
>>>>> the time anyway (e.g. my laptops do this). On systems with e.g.
>>>>> 15ms tick, what you will now see is Object.wait returning on time
>>>>> rather than much later than requested e.g. before this fix a
>>>>> timed-wait of 1ms returned in 15.14ms, but after the fix it was
>>>>> 2.3ms. (Note that this observation seems contrary to the general
>>>>> statement that windows timed-wait event operations can return
>>>>> early by up to a clock tick - as I said it seems highly hardware
>>>>> dependent.)
>>>>>
>>>>> Testing: tiers 1-3
>>>>> custom testing to look at sleep and wait times, as per
>>>>> bug report.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>
>>
More information about the hotspot-runtime-dev
mailing list