RFR: [XS] 8228658: test GetTotalSafepointTime.java fails on fast Linux machines with Total safepoint time 0 ms

Tue Jul 30 12:12:13 UTC 2019

Hi Matthias,

On 30/07/2019 9:25 pm, Baesken, Matthias wrote:
> Hello  JC / David,   here is a second webrev  :
> 
> http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.1/
> 
> It moves   the  thread dump execution into a  method  
> executeThreadDumps(long)     , and also adds  while loops   (but with a 
> limitation  for the number of thread dumps, really don’t
> want to cause timeouts etc.).    I removed a check for  
> MAX_VALUE_FOR_PASS   because we cannot go over Long.MAX_VALUE .

I don't think executeThreadDumps is worth factoring out like out.

The handling of NUM_THREAD_DUMPS is a bit confusing. I'd rather it 
remains a constant 100, and then you set a simple loop iteration count 
limit. Further with the proposed code when you get here:

  85         NUM_THREAD_DUMPS = NUM_THREAD_DUMPS * 2;

you don't even know what value you may be starting with.

But I was thinking of simply:

long value = 0;
do {
     Thread.getAllStackTraces();
     value = mbean.getTotalSafepointTime();
} while (value == 0);

We'd only hit a timeout if something is completely broken - which is fine.

Overall tests like this are not very useful, yet very fragile.

Thanks,
David

> Hope you like this version  better.
> 
> Best regards, Matthias
> 
> *From:*Jean Christophe Beyler <jcbeyler at google.com>
> *Sent:* Dienstag, 30. Juli 2019 05:39
> *To:* David Holmes <david.holmes at oracle.com>
> *Cc:* Baesken, Matthias <matthias.baesken at sap.com>; 
> hotspot-dev at openjdk.java.net; serviceability-dev 
> <serviceability-dev at openjdk.java.net>
> *Subject:* Re: RFR: [XS] 8228658: test GetTotalSafepointTime.java fails 
> on fast Linux machines with Total safepoint time 0 ms
> 
> Hi Matthias,
> 
> I wonder if you should not do what David is suggesting and then put that 
> whole code (the while loop) in a helper method. Below you have a 
> calculation again using value2 (which I wonder what the added value of 
> it is though) but anyway, that value2 could also be 0 at some point, no?
> 
> So would it not be best to just refactor the getAllStackTraces and 
> calculate safepoint time in a helper method for both value / value2 
> variables?
> 
> Thanks,
> 
> Jc
> 
> On Mon, Jul 29, 2019 at 7:50 PM David Holmes <david.holmes at oracle.com 
> <mailto:david.holmes at oracle.com>> wrote:
> 
>     Hi Matthias,
> 
>     On 29/07/2019 8:20 pm, Baesken, Matthias wrote:
>      > Hello , please review this small test fix .
>      >
>      > The test
>     test/jdk/sun/management/HotspotRuntimeMBean/GetTotalSafepointTime.java
>     fails sometimes on fast Linux machines with this error message :
>      >
>      > java.lang.RuntimeException: Total safepoint time illegal value: 0
>     ms (MIN = 1; MAX = 9223372036854775807)
>      >
>      > looks like the total safepoint time is too low currently on these
>     machines, it is < 1 ms.
>      >
>      > There might be several ways to handle this :
>      >
>      >    *   Change the test  in a way that it might generate nigher
>     safepoint times
>      >    *   Allow  safepoint time  == 0 ms
>      >    *   Offer an additional interface that gives  safepoint times 
>     with finer granularity ( currently the HS has safepoint time values
>     in ns , see  jdk/src/hotspot/share/runtime/safepoint.cpp 
>       SafepointTracing::end
>      >
>      > But it is converted on ms in this code
>      >
>      > 114jlong RuntimeService::safepoint_time_ms() {
>      > 115  return UsePerfData ?
>      > 116   
>     Management::ticks_to_ms(_safepoint_time_ticks->get_value()) : -1;
>      > 117}
>      >
>      > 064jlong Management::ticks_to_ms(jlong ticks) {
>      > 2065  assert(os::elapsed_frequency() > 0, "Must be non-zero");
>      > 2066  return (jlong)(((double)ticks /
>     (double)os::elapsed_frequency())
>      > 2067                 * (double)1000.0);
>      > 2068}
>      >
>      >
>      >
>      > Currently I go for  the first attempt (and try to generate 
>     higher safepoint times in my patch) .
> 
>     Yes that's probably best. Coarse-grained timing on very fast machines
>     was bound to eventually lead to problems.
> 
>     But perhaps a more future-proof approach is to just add a do-while loop
>     around the stack dumps and only exit when we have a non-zero safepoint
>     time?
> 
>     Thanks,
>     David
>     -----
> 
>      > Bug/webrev :
>      >
>      > https://bugs.openjdk.java.net/browse/JDK-8228658
>      >
>      > http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.0/
>      >
>      >
>      > Thanks, Matthias
>      >
> 
> 
> -- 
> 
> Thanks,
> 
> Jc
>