RFR: [XS] 8228658: test GetTotalSafepointTime.java fails on fast Linux machines with Total safepoint time 0 ms

Tue Jul 30 12:39:56 UTC 2019

Hi David,   "put that whole code (the while loop) in a helper method."   was JC's idea,  and I like the idea .

Let's see what others think .

> 
> Overall tests like this are not very useful, yet very fragile.
>

I am also  fine with putting the test on the exclude list.

Best regards, Matthias

> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Dienstag, 30. Juli 2019 14:12
> To: Baesken, Matthias <matthias.baesken at sap.com>; Jean Christophe
> Beyler <jcbeyler at google.com>
> Cc: hotspot-dev at openjdk.java.net; serviceability-dev <serviceability-
> dev at openjdk.java.net>
> Subject: Re: RFR: [XS] 8228658: test GetTotalSafepointTime.java fails on fast
> Linux machines with Total safepoint time 0 ms
> 
> Hi Matthias,
> 
> On 30/07/2019 9:25 pm, Baesken, Matthias wrote:
> > Hello  JC / David,   here is a second webrev  :
> >
> > http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.1/
> >
> > It moves   the  thread dump execution into a  method
> > executeThreadDumps(long)     , and also adds  while loops   (but with a
> > limitation  for the number of thread dumps, really don’t
> > want to cause timeouts etc.).    I removed a check for
> > MAX_VALUE_FOR_PASS   because we cannot go over Long.MAX_VALUE .
> 
> I don't think executeThreadDumps is worth factoring out like out.
> 
> The handling of NUM_THREAD_DUMPS is a bit confusing. I'd rather it
> remains a constant 100, and then you set a simple loop iteration count
> limit. Further with the proposed code when you get here:
> 
>   85         NUM_THREAD_DUMPS = NUM_THREAD_DUMPS * 2;
> 
> you don't even know what value you may be starting with.
> 
> But I was thinking of simply:
> 
> long value = 0;
> do {
>      Thread.getAllStackTraces();
>      value = mbean.getTotalSafepointTime();
> } while (value == 0);
> 
> We'd only hit a timeout if something is completely broken - which is fine.
> 
> Overall tests like this are not very useful, yet very fragile.
> 
> Thanks,
> David
> 
> > Hope you like this version  better.
> >
> > Best regards, Matthias
> >
> > *From:*Jean Christophe Beyler <jcbeyler at google.com>
> > *Sent:* Dienstag, 30. Juli 2019 05:39
> > *To:* David Holmes <david.holmes at oracle.com>
> > *Cc:* Baesken, Matthias <matthias.baesken at sap.com>;
> > hotspot-dev at openjdk.java.net; serviceability-dev
> > <serviceability-dev at openjdk.java.net>
> > *Subject:* Re: RFR: [XS] 8228658: test GetTotalSafepointTime.java fails
> > on fast Linux machines with Total safepoint time 0 ms
> >
> > Hi Matthias,
> >
> > I wonder if you should not do what David is suggesting and then put that
> > whole code (the while loop) in a helper method. Below you have a
> > calculation again using value2 (which I wonder what the added value of
> > it is though) but anyway, that value2 could also be 0 at some point, no?
> >
> > So would it not be best to just refactor the getAllStackTraces and
> > calculate safepoint time in a helper method for both value / value2
> > variables?
> >
> > Thanks,
> >
> > Jc
> >
> > On Mon, Jul 29, 2019 at 7:50 PM David Holmes <david.holmes at oracle.com
> > <mailto:david.holmes at oracle.com>> wrote:
> >
> >     Hi Matthias,
> >
> >     On 29/07/2019 8:20 pm, Baesken, Matthias wrote:
> >      > Hello , please review this small test fix .
> >      >
> >      > The test
> >
> test/jdk/sun/management/HotspotRuntimeMBean/GetTotalSafepointTime.
> java
> >     fails sometimes on fast Linux machines with this error message :
> >      >
> >      > java.lang.RuntimeException: Total safepoint time illegal value: 0
> >     ms (MIN = 1; MAX = 9223372036854775807)
> >      >
> >      > looks like the total safepoint time is too low currently on these
> >     machines, it is < 1 ms.
> >      >
> >      > There might be several ways to handle this :
> >      >
> >      >    *   Change the test  in a way that it might generate nigher
> >     safepoint times
> >      >    *   Allow  safepoint time  == 0 ms
> >      >    *   Offer an additional interface that gives  safepoint times
> >     with finer granularity ( currently the HS has safepoint time values
> >     in ns , see  jdk/src/hotspot/share/runtime/safepoint.cpp
> >       SafepointTracing::end
> >      >
> >      > But it is converted on ms in this code
> >      >
> >      > 114jlong RuntimeService::safepoint_time_ms() {
> >      > 115  return UsePerfData ?
> >      > 116
> >     Management::ticks_to_ms(_safepoint_time_ticks->get_value()) : -1;
> >      > 117}
> >      >
> >      > 064jlong Management::ticks_to_ms(jlong ticks) {
> >      > 2065  assert(os::elapsed_frequency() > 0, "Must be non-zero");
> >      > 2066  return (jlong)(((double)ticks /
> >     (double)os::elapsed_frequency())
> >      > 2067                 * (double)1000.0);
> >      > 2068}
> >      >
> >      >
> >      >
> >      > Currently I go for  the first attempt (and try to generate
> >     higher safepoint times in my patch) .
> >
> >     Yes that's probably best. Coarse-grained timing on very fast machines
> >     was bound to eventually lead to problems.
> >
> >     But perhaps a more future-proof approach is to just add a do-while loop
> >     around the stack dumps and only exit when we have a non-zero
> safepoint
> >     time?
> >
> >     Thanks,
> >     David
> >     -----
> >
> >      > Bug/webrev :
> >      >
> >      > https://bugs.openjdk.java.net/browse/JDK-8228658
> >      >
> >      > http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.0/
> >      >
> >      >
> >      > Thanks, Matthias
> >      >
> >
> >
> > --
> >
> > Thanks,
> >
> > Jc
> >