RFR: 8262094: Handshake timeout scaled wrong

Mon Feb 22 12:57:37 UTC 2021

On Mon, 22 Feb 2021 12:42:40 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>>> `G1ServiceThread` is using `os::elapsed_counter()` but if the `frequency` used in `TimeHelper::millis_to_counter(...)` is wrong I guess we could be in trouble. @robehn, is your theory that the calculated delay in some cases could be extremely long and because of that cause the timeouts we see?
>> 
>> The logging during handshake timeout doesn't work on windows for some reason, when I locally fixed the logging I notice that timeout was actually 2 seconds instead of 20 seconds. But that only seems to apply to some windows versions/(virtual hardware).
>> 
>> During some tests I have seem results such as:
>> "Error: time between events 0 ns"
>> When looking at the code the events are serialized with a mutex, so some time must have passed but the granularity of the system is to coarse. I don't know what else this can cause.
>
>> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-runtime-dev](mailto:hotspot-runtime-dev at openjdk.java.net):_
>> 
>> On 22/02/2021 7:03 pm, Stefan Karlsson wrote:
>> 
>> > On Sun, 21 Feb 2021 18:46:46 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
>> > > Parameter HandshakeTimeout is in milliseconds.
>> > > Internally we use nanoseconds.
>> > > HandshakeTimeout must be scaled to nanoseconds.
>> > > Passes T1
>> > 
>> > 
>> > Why are we using os::javaTimeNanos instead os::elapsed_counter()? I think we've tried to limit the usage of to places that explicitly interact with the Java side. For example, the reference processor uses os::javaTimeNanos because it interacts with java.lang.ref.SoftReference.clock. Most other places in the JVM uses os::elapsed_counter().
>> 
>> Runtime code (threading and sync) uses actual time units (ie
>> javaTimeNanos) rather than os::elapsed_counter(). And elapsed_counter()
>> uses javaTimeNanos() on all platforms but Windows.
> 
> I see. In the GC code we've tried to unify towards using one time source (elapsed_counter/elapsedTime/Ticks all use counters), and then convert to nanos/micros/millis when needed. It's a bit unfortunate that we don't use the same time source throughout the JVM.
> 
>  It's not really about
>> interaction with Java, just about semantics.
> 
> From the GC's point-of-view it is. We use os::javaTimeNanos() just were we are forced to use it because interactions with Java. Other use-cases uses the counter based source.
> 
>  A timeout/delay in "ticks"
>> would scale with CPU speed if "ticks" presented that, but
>> os::elapsed_counter() doesn't - so a timeout/delay of N "counts" is
>> somewhat arbitrary in duration.
> 
> For those cases you wouldn't pass down counter, but use TimeHelper::counter_to_millis (or equivalent).
> 
> But OK, given that Runtime already uses os::javaTimeNanos() I'll back off.

I agree with that we should use the same time source.
But here I just want to fixed this bug so the time-out is working as intended on windows.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2666