RFR: 8262094: Handshake timeout scaled wrong

Mon Feb 22 12:45:45 UTC 2021

On Mon, 22 Feb 2021 11:07:42 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>>> Why are we using os::javaTimeNanos instead os::elapsed_counter()? I think we've tried to limit the usage of to places that explicitly interact with the Java side. For example, the reference processor uses os::javaTimeNanos because it interacts with java.lang.ref.SoftReference.clock. Most other places in the JVM uses os::elapsed_counter().
>> 
>> Both uses a 'java type', jlong.
>> For logging and logic for yielding we want a humanly understand unit that is absolute.
>> Since things goes pretty fast nowadays nanseconds seems reasonable.
>> Instead of to converting the elapsed_counter to nanos, it's much simpler to just get the nanos.
>> But yes it would preferably with a os::time_nanos().
>
>> `G1ServiceThread` is using `os::elapsed_counter()` but if the `frequency` used in `TimeHelper::millis_to_counter(...)` is wrong I guess we could be in trouble. @robehn, is your theory that the calculated delay in some cases could be extremely long and because of that cause the timeouts we see?
> 
> The logging during handshake timeout doesn't work on windows for some reason, when I locally fixed the logging I notice that timeout was actually 2 seconds instead of 20 seconds. But that only seems to apply to some windows versions/(virtual hardware).
> 
> During some tests I have seem results such as:
> "Error: time between events 0 ns"
> When looking at the code the events are serialized with a mutex, so some time must have passed but the granularity of the system is to coarse. I don't know what else this can cause.

> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-runtime-dev](mailto:hotspot-runtime-dev at openjdk.java.net):_
> 
> On 22/02/2021 7:03 pm, Stefan Karlsson wrote:
> 
> > On Sun, 21 Feb 2021 18:46:46 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
> > > Parameter HandshakeTimeout is in milliseconds.
> > > Internally we use nanoseconds.
> > > HandshakeTimeout must be scaled to nanoseconds.
> > > Passes T1
> > 
> > 
> > Why are we using os::javaTimeNanos instead os::elapsed_counter()? I think we've tried to limit the usage of to places that explicitly interact with the Java side. For example, the reference processor uses os::javaTimeNanos because it interacts with java.lang.ref.SoftReference.clock. Most other places in the JVM uses os::elapsed_counter().
> 
> Runtime code (threading and sync) uses actual time units (ie
> javaTimeNanos) rather than os::elapsed_counter(). And elapsed_counter()
> uses javaTimeNanos() on all platforms but Windows.

I see. In the GC code we've tried to unify towards using one time source (elapsed_counter/elapsedTime/Ticks all use counters), and then convert to nanos/micros/millis when needed. It's a bit unfortunate that we don't use the same time source throughout the JVM.

 It's not really about
> interaction with Java, just about semantics.

>From the GC's point-of-view it is. We use os::javaTimeNanos() just were we are forced to use it because interactions with Java. Other use-cases uses the counter based source.

 A timeout/delay in "ticks"
> would scale with CPU speed if "ticks" presented that, but
> os::elapsed_counter() doesn't - so a timeout/delay of N "counts" is
> somewhat arbitrary in duration.

For those cases you wouldn't pass down counter, but use TimeHelper::counter_to_millis (or equivalent).

But OK, given that Runtime already uses os::javaTimeNanos() I'll back off.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2666