RFR: 8220795: Introduce TimedYield utility to improve time-to-safepoint on Windows
David Holmes
david.holmes at oracle.com
Tue Apr 9 02:45:02 UTC 2019
Hi Robin,
On 8/04/2019 10:47 pm, Robin Westberg wrote:
> Hi again,
>
> Here’s an updated version where I’ve moved the naked_short_nanosleep function into the Posix class, to avoid future cross-platform use. (It’s still used in the SpinYield and TimedYield implementations though).
But you also changed the existing Windows os::naked_short_sleep to use
the WaitableTimer which is a significant change to make. Is this just
because it will likely have better resolution than the native Sleep
function?
Code using os::naked_short_sleep(1) might be unexpectedly impacted by
this if what was a 10ms (or worse) sleep becomes closer to 1ms.
PerfDataManager::destroy in particular could be impacted as its racy to
begin with. That's not to say this isn't a good thing to fix, just be
aware it may have unexpected consequences.
> Full webrev: http://cr.openjdk.java.net/~rwestberg/8220795/webrev.01/
> Incremental: http://cr.openjdk.java.net/~rwestberg/8220795/webrev.00-01/
src/hotspot/share/utilities/spinYield.cpp
I'm somewhat dubious about getting rid of non-Posix
naked_short_nanosleep and instead adding a win32 ifdef to this code. It
kind of defeats the purpose of the os abstraction layer - albeit that
Windows can't really do a nanosleep.
Why did you get rid of the sleep_ns parameter and hardwire it to 1000? A
configurable sleep time would be a feature of this utility.
---
src/hotspot/share/utilities/timedYield.hpp
36 // scheduled on the same cpu as the waiter, we will first try local cpu
37 // yielding until we reach OS sleep primitive granularity in the waiting
Whether or not the yielding is "cpu local" will depend on the OS and the
scheduler in use. I would just refer to OS native yield.
40 jlong _yield_time_ns;
41 jlong _sleep_time_ns;
42 jlong _max_yield_time_ns;
43 jlong _yield_granularity_ns;
We are avoiding using j-types in the VM unless they actually pertain to
Java variables. These should be int64_t (to be compatible with
javaTimeNanos call).
52 // Perform next round of delay.
53 void wait();
I think delay() would be a better name than wait().
---
src/hotspot/share/utilities/timedYield.cpp
42 // Phase 1 - local cpu yielding
43 if (_yield_time_ns < _max_yield_time_ns) {
44 #ifdef WIN32
45 if (SwitchToThread() == 0) {
46 // Nothing else is ready to run on this cpu, spin a little
47 while (os::javaTimeNanos() - start < _yield_granularity_ns) {
48 SpinPause();
49 }
50 }
51 #else
52 os::Posix::naked_short_nanosleep(_yield_granularity_ns);
53 #endif
54 _yield_time_ns += os::javaTimeNanos() - start;
55 return;
56 }
I have a few issues with this code. It's breaking the os abstraction
layer again by using an OS ifdef and not using os APIs that exist for
this very purpose - i.e os::naked_yield(). And for non-Windows it seems
quite bizarre the "yielding" part of TimedYield is actually implemented
with a sleep and not os::naked_yield!
If the existing os api's need adjustment or expansion to provide the
functionality desired by this code then I would much prefer to see the
os API's updated to address that.
That said, given the original problem is that os::naked_short_nanosleep
on Windows is too coarse with the use of WaitableTimer why not just
replace that with a simple loop of the form:
while (elapsed < sleep_ns) {
if (SwitchToThread() == 0) {
SpinPause();
elapsed = ...
}
?
Thanks,
David
-----
> Best regards,
> Robin
>
>> On 5 Apr 2019, at 13:54, Robin Westberg <robin.westberg at oracle.com> wrote:
>>
>> Hi David,
>>
>>> On 5 Apr 2019, at 12:10, David Holmes <david.holmes at oracle.com> wrote:
>>>
>>> On 5/04/2019 7:53 pm, Robin Westberg wrote:
>>>> Hi David,
>>>> Thanks for taking a look!
>>>>> On 5 Apr 2019, at 10:49, David Holmes <david.holmes at oracle.com> wrote:
>>>>>
>>>>> Hi Robin,
>>>>>
>>>>> On 5/04/2019 6:05 pm, Robin Westberg wrote:
>>>>>> Hi all,
>>>>>> Please review the following change that modifies the way safepointing waits for other threads to stop. As part of JDK-8203469, os::naked_short_nanosleep was used for all platforms. However, on Windows, this function has millisecond granularity which is too coarse and caused performance regressions. Instead, we can take advantage of the fact that Windows can tell us if anyone else is waiting to run on the current cpu. For other platforms the original waiting is used.
>>>>>
>>>>> Can't you just make the new code the implementation of os::naked_short_nanosleep on Windows and avoid adding yet another sleep/yield abstraction? If Windows nanosleep only has millisecond granularity then it's broken.
>>>> Right, I considered that approach, but it's not obvious to me what semantics would be appropriate for that. Depending on whether you want an accurate sleep or want other threads to make progress, it could conceivably be either implemented as pure spinning, or spinning combined with yielding (either cpu-local or Sleep(0)-style).
>>>> That said, perhaps os_naked_short_nanosleep should be removed, the only remaining usage is in spinYield, which turns to sleeping after giving up on yielding. For windows, it may be more appropriate to switch to Sleep(0) in that case.
>>>
>>> I definitely don't think we need to introduce another abstraction (though I'm okay with replacing one with a new one).
>>
>> Okay, then I’d prefer to drop the os::naked_short_nanosleep. As I see it, you would use SpinYield when waiting for something like a CAS race, and TimedYield when waiting for a thread rendezvous.
>>
>> I did try to make TimedYield into a different “policy” for the existing SpinYield class at first, but didn’t quite feel like I found a nice API for it. So perhaps it's fine to just have them as separate classes.
>>
>>> It was probably discussed at the time but the naked_short_nanosleep on Windows definitely seems to have far too much overhead - even having to create a new waitable-timer each time. Do we know if the overhead is actually the resolution of the timer or whether it's all the extra work that method needs to do? I should try to dig up the email on that. :)
>>
>> Yes, as far as I know the best possible sleep resolution you can obtain on Windows is one millisecond (perhaps 0.5ms in theory if you go directly for NtSetTimerResolution). The SetWaitableTimer approach is the best though to actually obtain 1ms resolution, plain Sleep(1) usually sleeps around 2ms.
>>
>> Best regards,
>> Robin
>>
>>>
>>> Thanks,
>>> David
>>>
>>>> Best regards,
>>>> Robin
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>> Various benchmarks (specjbb2015, specjm2008) show better numbers for Windows and no regressions for Linux.
>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8220795
>>>>>> Webrev: http://cr.openjdk.java.net/~rwestberg/8220795/webrev.00/
>>>>>> Testing: tier1
>>>>>> Best regards,
>>>>>> Robin
>
More information about the hotspot-runtime-dev
mailing list