RFR: 8220795: Introduce TimedYield utility to improve time-to-safepoint on Windows

Tue Apr 9 02:45:02 UTC 2019

Hi Robin,

On 8/04/2019 10:47 pm, Robin Westberg wrote:
> Hi again,
> 
> Here’s an updated version where I’ve moved the naked_short_nanosleep function into the Posix class, to avoid future cross-platform use. (It’s still used in the SpinYield and TimedYield implementations though).

But you also changed the existing Windows os::naked_short_sleep to use 
the WaitableTimer which is a significant change to make. Is this just 
because it will likely have better resolution than the native Sleep 
function?

Code using os::naked_short_sleep(1) might be unexpectedly impacted by 
this if what was a 10ms (or worse) sleep becomes closer to 1ms. 
PerfDataManager::destroy in particular could be impacted as its racy to 
begin with. That's not to say this isn't a good thing to fix, just be 
aware it may have unexpected consequences.

> Full webrev: http://cr.openjdk.java.net/~rwestberg/8220795/webrev.01/
> Incremental: http://cr.openjdk.java.net/~rwestberg/8220795/webrev.00-01/

src/hotspot/share/utilities/spinYield.cpp

I'm somewhat dubious about getting rid of non-Posix 
naked_short_nanosleep and instead adding a win32 ifdef to this code. It 
kind of defeats the purpose of the os abstraction layer - albeit that 
Windows can't really do a nanosleep.

Why did you get rid of the sleep_ns parameter and hardwire it to 1000? A 
configurable sleep time would be a feature of this utility.

---

src/hotspot/share/utilities/timedYield.hpp

36 // scheduled on the same cpu as the waiter, we will first try local cpu
37 // yielding until we reach OS sleep primitive granularity in the waiting

Whether or not the yielding is "cpu local" will depend on the OS and the 
scheduler in use. I would just refer to OS native yield.

   40   jlong _yield_time_ns;
   41   jlong _sleep_time_ns;
   42   jlong _max_yield_time_ns;
   43   jlong _yield_granularity_ns;

We are avoiding using j-types in the VM unless they actually pertain to 
Java variables. These should be int64_t (to be compatible with 
javaTimeNanos call).

   52   // Perform next round of delay.
   53   void wait();

I think delay() would be a better name than wait().

---

src/hotspot/share/utilities/timedYield.cpp

   42   // Phase 1 - local cpu yielding
   43   if (_yield_time_ns < _max_yield_time_ns) {
   44 #ifdef WIN32
   45     if (SwitchToThread() == 0) {
   46       // Nothing else is ready to run on this cpu, spin a little
   47       while (os::javaTimeNanos() - start < _yield_granularity_ns) {
   48         SpinPause();
   49       }
   50     }
   51 #else
   52     os::Posix::naked_short_nanosleep(_yield_granularity_ns);
   53 #endif
   54     _yield_time_ns += os::javaTimeNanos() - start;
   55     return;
   56   }

I have a few issues with this code. It's breaking the os abstraction 
layer again by using an OS ifdef and not using os APIs that exist for 
this very purpose - i.e os::naked_yield(). And for non-Windows it seems 
quite bizarre the "yielding" part of TimedYield is actually implemented 
with a sleep and not os::naked_yield!

If the existing os api's need adjustment or expansion to provide the 
functionality desired by this code then I would much prefer to see the 
os API's updated to address that.

That said, given the original problem is that os::naked_short_nanosleep 
on Windows is too coarse with the use of WaitableTimer why not just 
replace that with a simple loop of the form:

while (elapsed < sleep_ns) {
   if (SwitchToThread() == 0) {
     SpinPause();
   elapsed = ...
}

?

Thanks,
David
-----

> Best regards,
> Robin
> 
>> On 5 Apr 2019, at 13:54, Robin Westberg <robin.westberg at oracle.com> wrote:
>>
>> Hi David,
>>
>>> On 5 Apr 2019, at 12:10, David Holmes <david.holmes at oracle.com> wrote:
>>>
>>> On 5/04/2019 7:53 pm, Robin Westberg wrote:
>>>> Hi David,
>>>> Thanks for taking a look!
>>>>> On 5 Apr 2019, at 10:49, David Holmes <david.holmes at oracle.com> wrote:
>>>>>
>>>>> Hi Robin,
>>>>>
>>>>> On 5/04/2019 6:05 pm, Robin Westberg wrote:
>>>>>> Hi all,
>>>>>> Please review the following change that modifies the way safepointing waits for other threads to stop. As part of JDK-8203469, os::naked_short_nanosleep was used for all platforms. However, on Windows, this function has millisecond granularity which is too coarse and caused performance regressions. Instead, we can take advantage of the fact that Windows can tell us if anyone else is waiting to run on the current cpu. For other platforms the original waiting is used.
>>>>>
>>>>> Can't you just make the new code the implementation of os::naked_short_nanosleep on Windows and avoid adding yet another sleep/yield abstraction? If Windows nanosleep only has millisecond granularity then it's broken.
>>>> Right, I considered that approach, but it's not obvious to me what semantics would be appropriate for that. Depending on whether you want an accurate sleep or want other threads to make progress, it could conceivably be either implemented as pure spinning, or spinning combined with yielding (either cpu-local or Sleep(0)-style).
>>>> That said, perhaps os_naked_short_nanosleep should be removed, the only remaining usage is in spinYield, which turns to sleeping after giving up on yielding. For windows, it may be more appropriate to switch to Sleep(0) in that case.
>>>
>>> I definitely don't think we need to introduce another abstraction (though I'm okay with replacing one with a new one).
>>
>> Okay, then I’d prefer to drop the os::naked_short_nanosleep. As I see it, you would use SpinYield when waiting for something like a CAS race, and TimedYield when waiting for a thread rendezvous.
>>
>> I did try to make TimedYield into a different “policy” for the existing SpinYield class at first, but didn’t quite feel like I found a nice API for it. So perhaps it's fine to just have them as separate classes.
>>
>>> It was probably discussed at the time but the naked_short_nanosleep on Windows definitely seems to have far too much overhead - even having to create a new waitable-timer each time. Do we know if the overhead is actually the resolution of the timer or whether it's all the extra work that method needs to do? I should try to dig up the email on that. :)
>>
>> Yes, as far as I know the best possible sleep resolution you can obtain on Windows is one millisecond (perhaps 0.5ms in theory if you go directly for NtSetTimerResolution). The SetWaitableTimer approach is the best though to actually obtain 1ms resolution, plain Sleep(1) usually sleeps around 2ms.
>>
>> Best regards,
>> Robin
>>
>>>
>>> Thanks,
>>> David
>>>
>>>> Best regards,
>>>> Robin
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>> Various benchmarks (specjbb2015, specjm2008) show better numbers for Windows and no regressions for Linux.
>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8220795
>>>>>> Webrev: http://cr.openjdk.java.net/~rwestberg/8220795/webrev.00/
>>>>>> Testing: tier1
>>>>>> Best regards,
>>>>>> Robin
>