RFR: 6313903: Thread.sleep(3) might wake up immediately on windows

Wed Sep 4 03:29:12 UTC 2019

Hi David,

No argument this code is sensitive;  I generally agree to minimize changes.

Here's what I had in mind:  (Not compiled)
It assumes that before the first park(millis) there will be no 
significant elapsed time that would reduce millis.
If that's not valid then using the original code will take that into 
account.

     jlong prevtime = javaTimeNanos();

     assert(thread->is_Java_thread(), "sanity check");
     JavaThread *jt = (JavaThread *) thread;

     for (;;) {

       // interruption has precedence over timing out
       if (os::is_interrupted(thread, true)) {
         return OS_INTRPT;
       }

       if (millis <= 0) {
         return OS_OK;
       }

       {
         ThreadBlockInVM tbivm(jt);
         OSThreadWaitState osts(jt->osthread(), false /* not Object.wait() */);

         jt->set_suspend_equivalent();
         // cleared by handle_special_suspend_equivalent_condition() or
         // java_suspend_self() via check_and_wait_while_suspended()

         slp->park(millis);

         // were we externally suspended while we were waiting?
         jt->check_and_wait_while_suspended();

         // wait complete, compute remaining wait time
         jlong newtime = javaTimeNanos();

         if (newtime - prevtime < 0) {
           // time moving backwards, should only happen if no monotonic clock
           // not a guarantee() because JVM should not abort on kernel/glibc bugs
           assert(!os::supports_monotonic_clock(),
                  "unexpected time moving backwards detected in os::sleep(interruptible)");
         } else {
           millis -= (newtime - prevtime) / NANOSECS_PER_MILLISEC;
         }
         prevtime = newtime;
       }
     }
     return OS_OK;

I don't feel strongly about this, if you prefer your original; that's fine.

$.02, Roger

On 9/3/19 7:20 PM, David Holmes wrote:
> Hi Roger,
>
> Thanks for taking a look.
>
> On 4/09/2019 12:30 am, Roger Riggs wrote:
>> Hi David,
>>
>> In os.cpp:1893, can the double read of javaTimeNanos be avoided on 
>> the first iteration?
>
> Note this is the existing code and I'm deliberately trying not to 
> change anything in that regard.
>
>> Set newtime = prevtime before the loop and re-read it after the call 
>> to park(:1918)
>
> I can't quite see the full change you propose here. The code has to 
> read the new time, check for timeout, update prevtime - in that order, 
> before doing the actual wait. If all that is moved after the wait then 
> we aren't giving interrupt precedence. If we move the interrupt check 
> after the wait as well then we need to add an additional interrupt 
> check before the loop for the first time through. (I actually had a 
> version that did this and decided it was too disruptive as part of 
> this code relocation.)
>
>> Negible nanotime should have elapsed between 1881 and 1893.
>> Ditto the non-interruptable code in 1936-1947.
>
> Note the non-interuptible code will disappear in my next RFE.
>
>> (Unless javaTimeNanos is free?).
>
> Afraid not.
>
> Thanks,
> David
> -----
>
>> $.02, Roger
>>
>>
>>
>>
>> On 9/2/19 2:06 AM, David Holmes wrote:
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-6313903
>>> webrev: http://cr.openjdk.java.net/~dholmes/6313903/webrev/
>>>
>>> This is a fix to a very long term problem on Windows that really 
>>> should have been fixed many years ago: the Windows timed-wait 
>>> routines can return early by up to a clock tick** (which itself 
>>> could be as long as 16ms). The fix for Thread.sleep is to track the 
>>> elapsed time (via nanoTime) and simply adjust and repeat the sleep 
>>> if we were going to return earlier than requested. This is the same 
>>> as what we have been doing on non-Windows for many years.
>>>
>>> ** whether or not you observe this seems highly dependent on 
>>> hardware - see bug report for observed behaviours.
>>>
>>> The additional motivation for fixing this now is to reduce the 
>>> amount of platform specific sleep/interrupt code and to allow 
>>> changes to the shared interrupt code. There are some additional 
>>> RFE's in the pipeline for that.
>>>
>>> This fix does the following:
>>> - moves the os_posix.cpp os::sleep implementation to os.cpp to be 
>>> shared by all platforms and deletes the os_windows.cpp os::sleep 
>>> implementation
>>>   - there are some additional comments added and the JavaThread 
>>> assertion is moved to the head of the "interruptible" code path
>>> - updates os_windows.cpp in these additional ways:
>>>   - updates the commentary about creating the interrupt_event (which 
>>> is now only used by JDK library code for use by Process.waitFor)
>>>   - updates os::interrupt to also unpark the _SleepEvent
>>>   - moves the use of the HighResolutionInterval utility class from 
>>> the os::sleep code to the PlatformEvent::park code, so that sleep on 
>>> Windows will still make adjustments to the clock tick if needed (it 
>>> also means that Object.wait() calls will now make the same 
>>> adjustments**).
>>>
>>> ** This is a change behaviour but I'm not concerned about there 
>>> being any adverse effects from this. On many systems the code will 
>>> have no affect as the system will be using the 1ms clock tick all 
>>> the time anyway (e.g. my laptops do this). On systems with e.g. 15ms 
>>> tick, what you will now see is Object.wait returning on time rather 
>>> than much later than requested e.g. before this fix a timed-wait of 
>>> 1ms returned in 15.14ms, but after the fix it was 2.3ms. (Note that 
>>> this observation seems contrary to the general statement that 
>>> windows timed-wait event operations can return early by up to a 
>>> clock tick - as I said it seems highly hardware dependent.)
>>>
>>> Testing: tiers 1-3
>>>          custom testing to look at sleep and wait times, as per bug 
>>> report.
>>>
>>> Thanks,
>>> David
>>> -----
>>