RFR: JDK-8210106: sun/tools/jps/TestJps.java timed out
David Holmes
david.holmes at oracle.com
Mon Dec 10 07:55:16 UTC 2018
Hi Dan,
> Gary saw a longest time value of 280 seconds in his testing with
> fastdebug bits. The default timeout value is 120 seconds with a
> default timeout factor of 4 in Mach5 gives a total timeout of
> 480 seconds (8 minutes). With Gary's new default timeout of
> 360 seconds, we'll have a total timeout value of 1440 seconds
> (24 minutes).
So based on that it seems Gary's observations are not relevant to what
has been seen in mach5 because at 280 seconds we should never have been
timing out!
I guess these changes won't hurt.
Thanks,
David
On 8/12/2018 1:12 am, Daniel D. Daugherty wrote:
> On 12/7/18 7:58 AM, Gary Adams wrote:
>> On 12/6/18, 7:52 PM, David Holmes wrote:
>>> Hi Gary,
>>>
>>> On 6/12/2018 11:27 pm, Gary Adams wrote:
>>>> On a local linux-x64-debug build this test consistently runs in less
>>>> than 40 seconds.
>>>> On the mach5 testing machines there was a large fluctuation in the
>>>> time to complete this test.
>>>>
>>>> Since the test runs jps with different combinations of arguments,
>>>> the total
>>>> test time is dependent on the number of processes running java
>>>> programs.
>>>> Since the mach5 test infrastructure runs java programs I have seen a 3X
>>>> in the amount of output the test produces compared to local test
>>>> runs.
>>>>
>>>> I've run the test several hundred times through mach5 on the slower
>>>> platforms
>>>> and then examined the test logs using a 3X setting from the default
>>>> 120 second
>>>> jtreg timeout. The slowest reported elapse time from the test logs
>>>> showed
>>>> 280 seconds to complete.
>>>>
>>>> To improve the reliability of the test to complete, I'd like to
>>>> increase the
>>>> timeout to 360 seconds.
>>>>
>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8210106
>>>>
>>>> Proposed fix:
>>>>
>>>> diff --git a/test/jdk/sun/tools/jps/TestJps.java
>>>> b/test/jdk/sun/tools/jps/TestJps.java
>>>> --- a/test/jdk/sun/tools/jps/TestJps.java
>>>> +++ b/test/jdk/sun/tools/jps/TestJps.java
>>>> @@ -27,7 +27,7 @@
>>>> * @modules jdk.jartool/sun.tools.jar
>>>> * @build LingeredAppForJps
>>>> * @build LingeredApp
>>>> - * @run main/othervm TestJps
>>>> + * @run main/othervm/timeout=360 TestJps
>>>> */
>>>
>>> Doesn't that then get scaled by the timeout factor resulting in a
>>> much longer timeout than the 360 seconds you intended?
>>>
>>> For other timeout adjustments the needed time has been divided by the
>>> timeout factor to get the actual intended timeout.
>>
>> This bug was filed fairly recently in Aug 2018.
>> At that time the timeout and timeout factor were not sufficient
>> to avoid the test failing.
>>
>> The mach5 timeout factors were adjusted recently, so this test may
>> no longer be an issue.
>>
>> If that is true, then we could simply close this bug as "cannot
>> reproduce".
>> An argument could be made that the change in timeout factor may be
>> responsible for fixing a lot more of the intermittent bugs and that they
>> should be closed in a similar manner.
>>
>> Historically, we could say this particular bug should have had timeouts
>> reassessed when the infrastructure switched from jprt to mach5 testing
>> where there were more visible Java processes running.
>>
>> Using a higher explicit timeout will not make the test run any longer
>> than it needs. It will simply allow the test to not be terminated sooner
>> in a hung test scenario.
>>
>> What is your preference for this particular issue:
>> - increase the explicit timeout
>> - close as cannot reproduce attributed to the timeout factor
>> adjustments
>>
>> What would you recommend going forward for other similar issues:
>> - determine a new explicit timeout
>> - close if no timeout failures have been observed since the timeout
>> factor
>> was raised
>
> General guidance that I've been giving folks is that you bump the
> default timeout when you see timeouts with 'release' bits with
> a timeout factor of '1'. The default timeout factor in Mach5 is 4
> so we run both 'release' and 'fastdebug' bits with that timeout
> factor. That means our total timeout value for 'release' bits is
> probably a little long and our total timeout value for 'fastdebug'
> bits is probably just right for a concurrency factor of 12.
>
> On my personal servers I use the following timeout factors:
>
> release)
> TIMEOUT_FACTOR="4"
> ;;
> fastdebug)
> TIMEOUT_FACTOR="6"
> ;;
> slowdebug)
> TIMEOUT_FACTOR="12"
> ;;
>
> However, I run with a concurrency factor of 16. I rarely see
> timeout failures.
>
> All that is well and good, but a 'jps' test is a different
> beast since it is affected by factors outside of most tests.
> Since the number of java processes running on the system can
> affect the test, I'm okay with using a default timeout of '360'
> for this test as a guard against increases in load or...
>
> Gary saw a longest time value of 280 seconds in his testing with
> fastdebug bits. The default timeout value is 120 seconds with a
> default timeout factor of 4 in Mach5 gives a total timeout of
> 480 seconds (8 minutes). With Gary's new default timeout of
> 360 seconds, we'll have a total timeout value of 1440 seconds
> (24 minutes).
>
> Before the recent changes to Mach5, the default timeout factor
> was 10 so we had a total timeout value of 1200 seconds (20
> minutes). Gary's change only bumps the total timeout value
> by 4 minutes from what we had back in August... when this bug
> was filed... unfortunately, the log for that sighting is gone
> so all we know is that the test timed out after 20 minutes
>
> > Timeout refired 1200 times
>
> We know that JTREG timeout handling kicks in at timeout expiration,
> but without the log we don't know how much longer than 20 minutes
> the test ran before JTREG killed it. 24 minutes might do it, but...
>
> Dan
>
>
>>
>>
>>>
>>> Cheers,
>>> David
>>>
>>
>>
>
More information about the serviceability-dev
mailing list