RFR: 6313903: Thread.sleep(3) might wake up immediately on windows

Daniel D. Daugherty daniel.daugherty at oracle.com
Wed Sep 4 14:15:41 UTC 2019


I vote for (mostly) relocation with this bug and improvements
in a new bug...

Dan


On 9/3/19 11:38 PM, David Holmes wrote:
> Hi Roger,
>
> Thanks for clarifying - yes that change would be okay. I may look at 
> making it further down the track with these changes so that we have 
> minimal disruption for now.
>
> Thanks again.
>
> David
>
> On 4/09/2019 1:29 pm, Roger Riggs wrote:
>> Hi David,
>>
>> No argument this code is sensitive;  I generally agree to minimize 
>> changes.
>>
>> Here's what I had in mind:  (Not compiled)
>> It assumes that before the first park(millis) there will be no 
>> significant elapsed time that would reduce millis.
>> If that's not valid then using the original code will take that into 
>> account.
>>
>>      jlong prevtime = javaTimeNanos();
>>
>>      assert(thread->is_Java_thread(), "sanity check");
>>      JavaThread *jt = (JavaThread *) thread;
>>
>>      for (;;) {
>>
>>        // interruption has precedence over timing out
>>        if (os::is_interrupted(thread, true)) {
>>          return OS_INTRPT;
>>        }
>>
>>        if (millis <= 0) {
>>          return OS_OK;
>>        }
>>
>>        {
>>          ThreadBlockInVM tbivm(jt);
>>          OSThreadWaitState osts(jt->osthread(), false /* not 
>> Object.wait() */);
>>
>>          jt->set_suspend_equivalent();
>>          // cleared by handle_special_suspend_equivalent_condition() or
>>          // java_suspend_self() via check_and_wait_while_suspended()
>>
>>          slp->park(millis);
>>
>>          // were we externally suspended while we were waiting?
>>          jt->check_and_wait_while_suspended();
>>                   // wait complete, compute remaining wait time
>>          jlong newtime = javaTimeNanos();
>>
>>          if (newtime - prevtime < 0) {
>>            // time moving backwards, should only happen if no 
>> monotonic clock
>>            // not a guarantee() because JVM should not abort on 
>> kernel/glibc bugs
>>            assert(!os::supports_monotonic_clock(),
>>                   "unexpected time moving backwards detected in 
>> os::sleep(interruptible)");
>>          } else {
>>            millis -= (newtime - prevtime) / NANOSECS_PER_MILLISEC;
>>          }
>>          prevtime = newtime;
>>        }
>>      }
>>      return OS_OK;
>>
>> I don't feel strongly about this, if you prefer your original; that's 
>> fine.
>>
>> $.02, Roger
>>
>>
>> On 9/3/19 7:20 PM, David Holmes wrote:
>>> Hi Roger,
>>>
>>> Thanks for taking a look.
>>>
>>> On 4/09/2019 12:30 am, Roger Riggs wrote:
>>>> Hi David,
>>>>
>>>> In os.cpp:1893, can the double read of javaTimeNanos be avoided on 
>>>> the first iteration?
>>>
>>> Note this is the existing code and I'm deliberately trying not to 
>>> change anything in that regard.
>>>
>>>> Set newtime = prevtime before the loop and re-read it after the 
>>>> call to park(:1918)
>>>
>>> I can't quite see the full change you propose here. The code has to 
>>> read the new time, check for timeout, update prevtime - in that 
>>> order, before doing the actual wait. If all that is moved after the 
>>> wait then we aren't giving interrupt precedence. If we move the 
>>> interrupt check after the wait as well then we need to add an 
>>> additional interrupt check before the loop for the first time 
>>> through. (I actually had a version that did this and decided it was 
>>> too disruptive as part of this code relocation.)
>>>
>>>> Negible nanotime should have elapsed between 1881 and 1893.
>>>> Ditto the non-interruptable code in 1936-1947.
>>>
>>> Note the non-interuptible code will disappear in my next RFE.
>>>
>>>> (Unless javaTimeNanos is free?).
>>>
>>> Afraid not.
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> $.02, Roger
>>>>
>>>>
>>>>
>>>>
>>>> On 9/2/19 2:06 AM, David Holmes wrote:
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-6313903
>>>>> webrev: http://cr.openjdk.java.net/~dholmes/6313903/webrev/
>>>>>
>>>>> This is a fix to a very long term problem on Windows that really 
>>>>> should have been fixed many years ago: the Windows timed-wait 
>>>>> routines can return early by up to a clock tick** (which itself 
>>>>> could be as long as 16ms). The fix for Thread.sleep is to track 
>>>>> the elapsed time (via nanoTime) and simply adjust and repeat the 
>>>>> sleep if we were going to return earlier than requested. This is 
>>>>> the same as what we have been doing on non-Windows for many years.
>>>>>
>>>>> ** whether or not you observe this seems highly dependent on 
>>>>> hardware - see bug report for observed behaviours.
>>>>>
>>>>> The additional motivation for fixing this now is to reduce the 
>>>>> amount of platform specific sleep/interrupt code and to allow 
>>>>> changes to the shared interrupt code. There are some additional 
>>>>> RFE's in the pipeline for that.
>>>>>
>>>>> This fix does the following:
>>>>> - moves the os_posix.cpp os::sleep implementation to os.cpp to be 
>>>>> shared by all platforms and deletes the os_windows.cpp os::sleep 
>>>>> implementation
>>>>>   - there are some additional comments added and the JavaThread 
>>>>> assertion is moved to the head of the "interruptible" code path
>>>>> - updates os_windows.cpp in these additional ways:
>>>>>   - updates the commentary about creating the interrupt_event 
>>>>> (which is now only used by JDK library code for use by 
>>>>> Process.waitFor)
>>>>>   - updates os::interrupt to also unpark the _SleepEvent
>>>>>   - moves the use of the HighResolutionInterval utility class from 
>>>>> the os::sleep code to the PlatformEvent::park code, so that sleep 
>>>>> on Windows will still make adjustments to the clock tick if needed 
>>>>> (it also means that Object.wait() calls will now make the same 
>>>>> adjustments**).
>>>>>
>>>>> ** This is a change behaviour but I'm not concerned about there 
>>>>> being any adverse effects from this. On many systems the code will 
>>>>> have no affect as the system will be using the 1ms clock tick all 
>>>>> the time anyway (e.g. my laptops do this). On systems with e.g. 
>>>>> 15ms tick, what you will now see is Object.wait returning on time 
>>>>> rather than much later than requested e.g. before this fix a 
>>>>> timed-wait of 1ms returned in 15.14ms, but after the fix it was 
>>>>> 2.3ms. (Note that this observation seems contrary to the general 
>>>>> statement that windows timed-wait event operations can return 
>>>>> early by up to a clock tick - as I said it seems highly hardware 
>>>>> dependent.)
>>>>>
>>>>> Testing: tiers 1-3
>>>>>          custom testing to look at sleep and wait times, as per 
>>>>> bug report.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>
>>



More information about the hotspot-runtime-dev mailing list