Codereview request: 8050115 javax/management/monitor/GaugeMonitorDeadlockTest.java fails intermittently

Daniel Fuchs daniel.fuchs at oracle.com
Wed Sep 17 14:55:38 UTC 2014


On 9/17/14 4:43 PM, shanliang wrote:
> Daniel,
>
> We could not be sure that the test failed of timeout, that's why I tried
> to add more checks.
>
> The check for Step 1: all thread traces were printed out only if
> deadlock was found, and the test failed immediately.
> The check for Step 2:
>     1) all thread traces were printed out only if the tested thread was
> blocked, but the test did not fail because we were not sure if deadlock
> happened, but the info might be helpful;
>     2) otherwise only the trace of the tested thread was printed out.
>
> In case that the test gets interrupted again by the test harness, hope
> we can have some useful info from these 2 checks.
>
> It must not be so heavy but still could impact the test, your suggestion
> to use test.timeout.factor is a good idea, I added the code to calculate
> the checking time based on it:
>     http://cr.openjdk.java.net/~sjiang/JDK-8050115/03/

I Shanliang,

This looks much better, thanks.
May I suggested taking the current time again at lines 125
and 179:

    checkedTime = System.currentTimeMillis();

It would allow to discount the time spent in checking.

best regards,

-- daniel

>
> Thanks,
> Shanliang
>
> Daniel Fuchs wrote:
>> Hi Shanliang,
>>
>> On 9/17/14 2:19 PM, shanliang wrote:
>>> Daniel,
>>>
>>> The test does 2 steps of verifications, the new check is useful for the
>>> first step, and the trace in the bug showed that the test failed on the
>>> first step.
>>>
>>> Yes the check might not work for the second step, I added the new code
>>> for the second step to check the tested thread state and hope to have
>>> useful info if the test failed on the second step.
>>>
>>> Here is the new version:
>>>     http://cr.openjdk.java.net/~sjiang/JDK-8050115/02/
>>>
>>> Thanks,
>>> Shanliang
>>
>> If I understand the issue correctly - the test fails in timeout
>> mostly on very slow machines/configurations (fastdebug with some
>> combinations of options).
>>
>> I worry that printing a thread dump every seconds (1000ms) is going
>> to make things worse: the test will spend its time printing thread
>> dumps instead of doing what it is supposed to do - and will have
>> even less CPU cycles to execute its 'real' code.
>>
>> I would have advised printing the thread dumps only at the end,
>> when it is detected that there might be a deadlock - except that
>> now we can't do that since the timeout is managed completely
>> by the harness (so we don't get the upper hand at the end in
>> case of timeout).
>>
>> I think depending on the harness to set the appropriate timeout
>> rather than depending on an arbitrary timeout set in the test itself
>> is the right thing to do. It's been a pattern in many tests that
>> failed in timeout intermittently on some slow machines/configuration.
>>
>> In any case - 1s seems really too frequent.
>> I suppose you could inspect the system properties set by the harness
>> (timeout + timeout factor) to devise an acceptable frequency for
>> your checks - if you really want to print this info.
>>
>> From the log I see that the timeout factor passed to the harness
>> for the slow configuration that failed is
>> -Dtest.timeout.factor=8.0
>> There's no explicit timeout given - and jtreg -onlineHelp reveals
>> that in this case the default timeout is two minutes.
>>
>> This means that the harness has allocated 2*8=16mins for the test to
>> execute.
>> I don't think you want to take the risk of printing a thread dump
>> every seconds during 16 minutes ;-)
>>
>> Of course I'm over simplifying here. Before your changes - the test
>> was deciding after 46.893 seconds that there must be a deadlock.
>> 47s is obviously way too short for a possibly slow machine running
>> the test in fastdebug mode.
>>
>> Something like the following might be more reasonable:
>>
>> // default timeout factor is 1.0
>> double factor =
>>     Double.parseDouble(
>>        System.getProperty("test.timeout.factor", "1.0"));
>> // default timeout is 2mins = 120s.
>> double timeout = Double.parseDouble(
>>        System.getProperty("test.timeout", "120"));
>>
>> // total time is timeout * timeout factor * 1000 ms
>> long total = (long) factor * timeout * 1000;
>>
>> // Don't print thread dumps too often.
>> // every 5s for a total timeout of 120s seems reasonable.
>> // 120s/5s = 24; we will lengthen the delay if the total
>> // timeout is greater than 120s, so we're taking the max between
>> // 5s and total/24
>> long delayBetweenThreadDumps = Math.max(5000, total/24);
>>
>> Of course 5s and total/24 are just arbitrary...
>> But 24 full thread dumps in a log for a single test is enough data
>> to analyze I think ;-)
>>
>> best regards,
>>
>> -- daniel
>>
>>>
>>>
>>> Daniel Fuchs wrote:
>>>> On 9/17/14 10:55 AM, shanliang wrote:
>>>>> David Holmes wrote:
>>>>>> On 17/09/2014 7:01 AM, shanliang wrote:
>>>>>>> David Holmes wrote:
>>>>>>>> Hi Shanliang,
>>>>>>>>
>>>>>>>> On 16/09/2014 7:12 PM, shanliang wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Please review the following fix:
>>>>>>>>
>>>>>>>> I don't see any functional change. You seem to have replaced a
>>>>>>>> built-in timeout with the externally applied test harness timeout.
>>>>>>> Yes no functional change here, we thought that the test needed more
>>>>>>> time
>>>>>>> to wait a change if a testing VM or machine was really slow, the
>>>>>>> test
>>>>>>> harness timeout was the maximum time we could give the test.
>>>>>>
>>>>>> Do we have confidence that the harness timeout is sufficient to
>>>>>> handle
>>>>>> the intermittent failures?
>>>>> Really a good question :)
>>>>>
>>>>> Here is new version:
>>>>>     http://cr.openjdk.java.net/~sjiang/JDK-8050115/01/
>>>>>
>>>>> I added a deadlocked check in every 1 second, hope to get more info in
>>>>> case of failure.
>>>>
>>>> The following comment seems to imply that this check is not
>>>> very useful:
>>>>
>>>>  112             // This won't show up as a deadlock in CTRL-\ or in
>>>>  113             // ThreadMXBean.findDeadlockedThreads(), because they
>>>> don't
>>>>  114             // see that thread A is waiting for thread B
>>>> (B.join()), and
>>>>  115             // thread B is waiting for a lock held by thread A
>>>>
>>>> best regards,
>>>>
>>>> -- daniel
>>>>
>>>>>
>>>>> I changed also the sleep time to 100ms, 10ms seems too short as Daniel
>>>>> pointed out.
>>>>>
>>>>> Thanks,
>>>>> Shanliang
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Style nit: add a space after 'while' -> while (cond) {
>>>>>>> OK, I will do it before pushing.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Shanliang
>>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8050115
>>>>>>>>> webrev: http://cr.openjdk.java.net/~sjiang/JDK-8050115/00/
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Shanliang
>>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the serviceability-dev mailing list