Codereview request: 8050115 javax/management/monitor/GaugeMonitorDeadlockTest.java fails intermittently
Daniel Fuchs
daniel.fuchs at oracle.com
Wed Sep 17 13:29:44 UTC 2014
Hi Shanliang,
On 9/17/14 2:19 PM, shanliang wrote:
> Daniel,
>
> The test does 2 steps of verifications, the new check is useful for the
> first step, and the trace in the bug showed that the test failed on the
> first step.
>
> Yes the check might not work for the second step, I added the new code
> for the second step to check the tested thread state and hope to have
> useful info if the test failed on the second step.
>
> Here is the new version:
> http://cr.openjdk.java.net/~sjiang/JDK-8050115/02/
>
> Thanks,
> Shanliang
If I understand the issue correctly - the test fails in timeout
mostly on very slow machines/configurations (fastdebug with some
combinations of options).
I worry that printing a thread dump every seconds (1000ms) is going
to make things worse: the test will spend its time printing thread
dumps instead of doing what it is supposed to do - and will have
even less CPU cycles to execute its 'real' code.
I would have advised printing the thread dumps only at the end,
when it is detected that there might be a deadlock - except that
now we can't do that since the timeout is managed completely
by the harness (so we don't get the upper hand at the end in
case of timeout).
I think depending on the harness to set the appropriate timeout
rather than depending on an arbitrary timeout set in the test itself
is the right thing to do. It's been a pattern in many tests that
failed in timeout intermittently on some slow machines/configuration.
In any case - 1s seems really too frequent.
I suppose you could inspect the system properties set by the harness
(timeout + timeout factor) to devise an acceptable frequency for
your checks - if you really want to print this info.
From the log I see that the timeout factor passed to the harness
for the slow configuration that failed is
-Dtest.timeout.factor=8.0
There's no explicit timeout given - and jtreg -onlineHelp reveals
that in this case the default timeout is two minutes.
This means that the harness has allocated 2*8=16mins for the test to
execute.
I don't think you want to take the risk of printing a thread dump
every seconds during 16 minutes ;-)
Of course I'm over simplifying here. Before your changes - the test
was deciding after 46.893 seconds that there must be a deadlock.
47s is obviously way too short for a possibly slow machine running
the test in fastdebug mode.
Something like the following might be more reasonable:
// default timeout factor is 1.0
double factor =
Double.parseDouble(
System.getProperty("test.timeout.factor", "1.0"));
// default timeout is 2mins = 120s.
double timeout = Double.parseDouble(
System.getProperty("test.timeout", "120"));
// total time is timeout * timeout factor * 1000 ms
long total = (long) factor * timeout * 1000;
// Don't print thread dumps too often.
// every 5s for a total timeout of 120s seems reasonable.
// 120s/5s = 24; we will lengthen the delay if the total
// timeout is greater than 120s, so we're taking the max between
// 5s and total/24
long delayBetweenThreadDumps = Math.max(5000, total/24);
Of course 5s and total/24 are just arbitrary...
But 24 full thread dumps in a log for a single test is enough data
to analyze I think ;-)
best regards,
-- daniel
>
>
> Daniel Fuchs wrote:
>> On 9/17/14 10:55 AM, shanliang wrote:
>>> David Holmes wrote:
>>>> On 17/09/2014 7:01 AM, shanliang wrote:
>>>>> David Holmes wrote:
>>>>>> Hi Shanliang,
>>>>>>
>>>>>> On 16/09/2014 7:12 PM, shanliang wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Please review the following fix:
>>>>>>
>>>>>> I don't see any functional change. You seem to have replaced a
>>>>>> built-in timeout with the externally applied test harness timeout.
>>>>> Yes no functional change here, we thought that the test needed more
>>>>> time
>>>>> to wait a change if a testing VM or machine was really slow, the test
>>>>> harness timeout was the maximum time we could give the test.
>>>>
>>>> Do we have confidence that the harness timeout is sufficient to handle
>>>> the intermittent failures?
>>> Really a good question :)
>>>
>>> Here is new version:
>>> http://cr.openjdk.java.net/~sjiang/JDK-8050115/01/
>>>
>>> I added a deadlocked check in every 1 second, hope to get more info in
>>> case of failure.
>>
>> The following comment seems to imply that this check is not
>> very useful:
>>
>> 112 // This won't show up as a deadlock in CTRL-\ or in
>> 113 // ThreadMXBean.findDeadlockedThreads(), because they
>> don't
>> 114 // see that thread A is waiting for thread B
>> (B.join()), and
>> 115 // thread B is waiting for a lock held by thread A
>>
>> best regards,
>>
>> -- daniel
>>
>>>
>>> I changed also the sleep time to 100ms, 10ms seems too short as Daniel
>>> pointed out.
>>>
>>> Thanks,
>>> Shanliang
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>
>>>>>>
>>>>>> Style nit: add a space after 'while' -> while (cond) {
>>>>> OK, I will do it before pushing.
>>>>>
>>>>> Thanks,
>>>>> Shanliang
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8050115
>>>>>>> webrev: http://cr.openjdk.java.net/~sjiang/JDK-8050115/00/
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Shanliang
>>>>>
>>>
>>
>
More information about the serviceability-dev
mailing list