Codereview request: 8050115 javax/management/monitor/GaugeMonitorDeadlockTest.java fails intermittently
shanliang
shanliang.jiang at oracle.com
Wed Sep 17 15:09:18 UTC 2014
Daniel Fuchs wrote:
> On 9/17/14 4:43 PM, shanliang wrote:
>> Daniel,
>>
>> We could not be sure that the test failed of timeout, that's why I tried
>> to add more checks.
>>
>> The check for Step 1: all thread traces were printed out only if
>> deadlock was found, and the test failed immediately.
>> The check for Step 2:
>> 1) all thread traces were printed out only if the tested thread was
>> blocked, but the test did not fail because we were not sure if deadlock
>> happened, but the info might be helpful;
>> 2) otherwise only the trace of the tested thread was printed out.
>>
>> In case that the test gets interrupted again by the test harness, hope
>> we can have some useful info from these 2 checks.
>>
>> It must not be so heavy but still could impact the test, your suggestion
>> to use test.timeout.factor is a good idea, I added the code to calculate
>> the checking time based on it:
>> http://cr.openjdk.java.net/~sjiang/JDK-8050115/03/
>
> I Shanliang,
>
> This looks much better, thanks.
> May I suggested taking the current time again at lines 125
> and 179:
>
> checkedTime = System.currentTimeMillis();
>
> It would allow to discount the time spent in checking.
Here is the new version with your suggestion to re-calculate checkedTime.
http://cr.openjdk.java.net/~sjiang/JDK-8050115/04/
Thanks a lot for your time!
Shanliang
>
> best regards,
>
> -- daniel
>
>>
>> Thanks,
>> Shanliang
>>
>> Daniel Fuchs wrote:
>>> Hi Shanliang,
>>>
>>> On 9/17/14 2:19 PM, shanliang wrote:
>>>> Daniel,
>>>>
>>>> The test does 2 steps of verifications, the new check is useful for
>>>> the
>>>> first step, and the trace in the bug showed that the test failed on
>>>> the
>>>> first step.
>>>>
>>>> Yes the check might not work for the second step, I added the new code
>>>> for the second step to check the tested thread state and hope to have
>>>> useful info if the test failed on the second step.
>>>>
>>>> Here is the new version:
>>>> http://cr.openjdk.java.net/~sjiang/JDK-8050115/02/
>>>>
>>>> Thanks,
>>>> Shanliang
>>>
>>> If I understand the issue correctly - the test fails in timeout
>>> mostly on very slow machines/configurations (fastdebug with some
>>> combinations of options).
>>>
>>> I worry that printing a thread dump every seconds (1000ms) is going
>>> to make things worse: the test will spend its time printing thread
>>> dumps instead of doing what it is supposed to do - and will have
>>> even less CPU cycles to execute its 'real' code.
>>>
>>> I would have advised printing the thread dumps only at the end,
>>> when it is detected that there might be a deadlock - except that
>>> now we can't do that since the timeout is managed completely
>>> by the harness (so we don't get the upper hand at the end in
>>> case of timeout).
>>>
>>> I think depending on the harness to set the appropriate timeout
>>> rather than depending on an arbitrary timeout set in the test itself
>>> is the right thing to do. It's been a pattern in many tests that
>>> failed in timeout intermittently on some slow machines/configuration.
>>>
>>> In any case - 1s seems really too frequent.
>>> I suppose you could inspect the system properties set by the harness
>>> (timeout + timeout factor) to devise an acceptable frequency for
>>> your checks - if you really want to print this info.
>>>
>>> From the log I see that the timeout factor passed to the harness
>>> for the slow configuration that failed is
>>> -Dtest.timeout.factor=8.0
>>> There's no explicit timeout given - and jtreg -onlineHelp reveals
>>> that in this case the default timeout is two minutes.
>>>
>>> This means that the harness has allocated 2*8=16mins for the test to
>>> execute.
>>> I don't think you want to take the risk of printing a thread dump
>>> every seconds during 16 minutes ;-)
>>>
>>> Of course I'm over simplifying here. Before your changes - the test
>>> was deciding after 46.893 seconds that there must be a deadlock.
>>> 47s is obviously way too short for a possibly slow machine running
>>> the test in fastdebug mode.
>>>
>>> Something like the following might be more reasonable:
>>>
>>> // default timeout factor is 1.0
>>> double factor =
>>> Double.parseDouble(
>>> System.getProperty("test.timeout.factor", "1.0"));
>>> // default timeout is 2mins = 120s.
>>> double timeout = Double.parseDouble(
>>> System.getProperty("test.timeout", "120"));
>>>
>>> // total time is timeout * timeout factor * 1000 ms
>>> long total = (long) factor * timeout * 1000;
>>>
>>> // Don't print thread dumps too often.
>>> // every 5s for a total timeout of 120s seems reasonable.
>>> // 120s/5s = 24; we will lengthen the delay if the total
>>> // timeout is greater than 120s, so we're taking the max between
>>> // 5s and total/24
>>> long delayBetweenThreadDumps = Math.max(5000, total/24);
>>>
>>> Of course 5s and total/24 are just arbitrary...
>>> But 24 full thread dumps in a log for a single test is enough data
>>> to analyze I think ;-)
>>>
>>> best regards,
>>>
>>> -- daniel
>>>
>>>>
>>>>
>>>> Daniel Fuchs wrote:
>>>>> On 9/17/14 10:55 AM, shanliang wrote:
>>>>>> David Holmes wrote:
>>>>>>> On 17/09/2014 7:01 AM, shanliang wrote:
>>>>>>>> David Holmes wrote:
>>>>>>>>> Hi Shanliang,
>>>>>>>>>
>>>>>>>>> On 16/09/2014 7:12 PM, shanliang wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Please review the following fix:
>>>>>>>>>
>>>>>>>>> I don't see any functional change. You seem to have replaced a
>>>>>>>>> built-in timeout with the externally applied test harness
>>>>>>>>> timeout.
>>>>>>>> Yes no functional change here, we thought that the test needed
>>>>>>>> more
>>>>>>>> time
>>>>>>>> to wait a change if a testing VM or machine was really slow, the
>>>>>>>> test
>>>>>>>> harness timeout was the maximum time we could give the test.
>>>>>>>
>>>>>>> Do we have confidence that the harness timeout is sufficient to
>>>>>>> handle
>>>>>>> the intermittent failures?
>>>>>> Really a good question :)
>>>>>>
>>>>>> Here is new version:
>>>>>> http://cr.openjdk.java.net/~sjiang/JDK-8050115/01/
>>>>>>
>>>>>> I added a deadlocked check in every 1 second, hope to get more
>>>>>> info in
>>>>>> case of failure.
>>>>>
>>>>> The following comment seems to imply that this check is not
>>>>> very useful:
>>>>>
>>>>> 112 // This won't show up as a deadlock in CTRL-\ or in
>>>>> 113 // ThreadMXBean.findDeadlockedThreads(), because
>>>>> they
>>>>> don't
>>>>> 114 // see that thread A is waiting for thread B
>>>>> (B.join()), and
>>>>> 115 // thread B is waiting for a lock held by thread A
>>>>>
>>>>> best regards,
>>>>>
>>>>> -- daniel
>>>>>
>>>>>>
>>>>>> I changed also the sleep time to 100ms, 10ms seems too short as
>>>>>> Daniel
>>>>>> pointed out.
>>>>>>
>>>>>> Thanks,
>>>>>> Shanliang
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Style nit: add a space after 'while' -> while (cond) {
>>>>>>>> OK, I will do it before pushing.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Shanliang
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8050115
>>>>>>>>>> webrev: http://cr.openjdk.java.net/~sjiang/JDK-8050115/00/
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Shanliang
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the serviceability-dev
mailing list