Codereview request: 8050115 javax/management/monitor/GaugeMonitorDeadlockTest.java fails intermittently

shanliang shanliang.jiang at oracle.com
Wed Sep 17 14:43:06 UTC 2014


Daniel,

We could not be sure that the test failed of timeout, that's why I tried 
to add more checks.

The check for Step 1: all thread traces were printed out only if 
deadlock was found, and the test failed immediately.
The check for Step 2:
    1) all thread traces were printed out only if the tested thread was 
blocked, but the test did not fail because we were not sure if deadlock 
happened, but the info might be helpful;
    2) otherwise only the trace of the tested thread was printed out.

In case that the test gets interrupted again by the test harness, hope 
we can have some useful info from these 2 checks.

It must not be so heavy but still could impact the test, your suggestion 
to use test.timeout.factor is a good idea, I added the code to calculate 
the checking time based on it:
    http://cr.openjdk.java.net/~sjiang/JDK-8050115/03/

Thanks,
Shanliang

Daniel Fuchs wrote:
> Hi Shanliang,
>
> On 9/17/14 2:19 PM, shanliang wrote:
>> Daniel,
>>
>> The test does 2 steps of verifications, the new check is useful for the
>> first step, and the trace in the bug showed that the test failed on the
>> first step.
>>
>> Yes the check might not work for the second step, I added the new code
>> for the second step to check the tested thread state and hope to have
>> useful info if the test failed on the second step.
>>
>> Here is the new version:
>>     http://cr.openjdk.java.net/~sjiang/JDK-8050115/02/
>>
>> Thanks,
>> Shanliang
>
> If I understand the issue correctly - the test fails in timeout
> mostly on very slow machines/configurations (fastdebug with some
> combinations of options).
>
> I worry that printing a thread dump every seconds (1000ms) is going
> to make things worse: the test will spend its time printing thread
> dumps instead of doing what it is supposed to do - and will have
> even less CPU cycles to execute its 'real' code.
>
> I would have advised printing the thread dumps only at the end,
> when it is detected that there might be a deadlock - except that
> now we can't do that since the timeout is managed completely
> by the harness (so we don't get the upper hand at the end in
> case of timeout).
>
> I think depending on the harness to set the appropriate timeout
> rather than depending on an arbitrary timeout set in the test itself
> is the right thing to do. It's been a pattern in many tests that
> failed in timeout intermittently on some slow machines/configuration.
>
> In any case - 1s seems really too frequent.
> I suppose you could inspect the system properties set by the harness
> (timeout + timeout factor) to devise an acceptable frequency for
> your checks - if you really want to print this info.
>
> From the log I see that the timeout factor passed to the harness
> for the slow configuration that failed is
> -Dtest.timeout.factor=8.0
> There's no explicit timeout given - and jtreg -onlineHelp reveals
> that in this case the default timeout is two minutes.
>
> This means that the harness has allocated 2*8=16mins for the test to
> execute.
> I don't think you want to take the risk of printing a thread dump
> every seconds during 16 minutes ;-)
>
> Of course I'm over simplifying here. Before your changes - the test
> was deciding after 46.893 seconds that there must be a deadlock.
> 47s is obviously way too short for a possibly slow machine running
> the test in fastdebug mode.
>
> Something like the following might be more reasonable:
>
> // default timeout factor is 1.0
> double factor =
>     Double.parseDouble(
>        System.getProperty("test.timeout.factor", "1.0"));
> // default timeout is 2mins = 120s.
> double timeout = Double.parseDouble(
>        System.getProperty("test.timeout", "120"));
>
> // total time is timeout * timeout factor * 1000 ms
> long total = (long) factor * timeout * 1000;
>
> // Don't print thread dumps too often.
> // every 5s for a total timeout of 120s seems reasonable.
> // 120s/5s = 24; we will lengthen the delay if the total
> // timeout is greater than 120s, so we're taking the max between
> // 5s and total/24
> long delayBetweenThreadDumps = Math.max(5000, total/24);
>
> Of course 5s and total/24 are just arbitrary...
> But 24 full thread dumps in a log for a single test is enough data
> to analyze I think ;-)
>
> best regards,
>
> -- daniel
>
>>
>>
>> Daniel Fuchs wrote:
>>> On 9/17/14 10:55 AM, shanliang wrote:
>>>> David Holmes wrote:
>>>>> On 17/09/2014 7:01 AM, shanliang wrote:
>>>>>> David Holmes wrote:
>>>>>>> Hi Shanliang,
>>>>>>>
>>>>>>> On 16/09/2014 7:12 PM, shanliang wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Please review the following fix:
>>>>>>>
>>>>>>> I don't see any functional change. You seem to have replaced a
>>>>>>> built-in timeout with the externally applied test harness timeout.
>>>>>> Yes no functional change here, we thought that the test needed more
>>>>>> time
>>>>>> to wait a change if a testing VM or machine was really slow, the 
>>>>>> test
>>>>>> harness timeout was the maximum time we could give the test.
>>>>>
>>>>> Do we have confidence that the harness timeout is sufficient to 
>>>>> handle
>>>>> the intermittent failures?
>>>> Really a good question :)
>>>>
>>>> Here is new version:
>>>>     http://cr.openjdk.java.net/~sjiang/JDK-8050115/01/
>>>>
>>>> I added a deadlocked check in every 1 second, hope to get more info in
>>>> case of failure.
>>>
>>> The following comment seems to imply that this check is not
>>> very useful:
>>>
>>>  112             // This won't show up as a deadlock in CTRL-\ or in
>>>  113             // ThreadMXBean.findDeadlockedThreads(), because they
>>> don't
>>>  114             // see that thread A is waiting for thread B
>>> (B.join()), and
>>>  115             // thread B is waiting for a lock held by thread A
>>>
>>> best regards,
>>>
>>> -- daniel
>>>
>>>>
>>>> I changed also the sleep time to 100ms, 10ms seems too short as Daniel
>>>> pointed out.
>>>>
>>>> Thanks,
>>>> Shanliang
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>>>>
>>>>>>> Style nit: add a space after 'while' -> while (cond) {
>>>>>> OK, I will do it before pushing.
>>>>>>
>>>>>> Thanks,
>>>>>> Shanliang
>>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8050115
>>>>>>>> webrev: http://cr.openjdk.java.net/~sjiang/JDK-8050115/00/
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Shanliang
>>>>>>
>>>>
>>>
>>
>



More information about the serviceability-dev mailing list