Codereview request: 8050115 javax/management/monitor/GaugeMonitorDeadlockTest.java fails intermittently
David Holmes
david.holmes at oracle.com
Thu Sep 18 02:32:37 UTC 2014
Still not 100% sure about the deadlock detection logic here, but as long
as it does no harm - ok.
Minor style nit:
59 System.out.println("=== checkingTime = "+checkingTime+"ms");
spaces needed around the + operator.
Thanks,
David
On 18/09/2014 1:09 AM, shanliang wrote:
> Daniel Fuchs wrote:
>> On 9/17/14 4:43 PM, shanliang wrote:
>>> Daniel,
>>>
>>> We could not be sure that the test failed of timeout, that's why I tried
>>> to add more checks.
>>>
>>> The check for Step 1: all thread traces were printed out only if
>>> deadlock was found, and the test failed immediately.
>>> The check for Step 2:
>>> 1) all thread traces were printed out only if the tested thread was
>>> blocked, but the test did not fail because we were not sure if deadlock
>>> happened, but the info might be helpful;
>>> 2) otherwise only the trace of the tested thread was printed out.
>>>
>>> In case that the test gets interrupted again by the test harness, hope
>>> we can have some useful info from these 2 checks.
>>>
>>> It must not be so heavy but still could impact the test, your suggestion
>>> to use test.timeout.factor is a good idea, I added the code to calculate
>>> the checking time based on it:
>>> http://cr.openjdk.java.net/~sjiang/JDK-8050115/03/
>>
>> I Shanliang,
>>
>> This looks much better, thanks.
>> May I suggested taking the current time again at lines 125
>> and 179:
>>
>> checkedTime = System.currentTimeMillis();
>>
>> It would allow to discount the time spent in checking.
> Here is the new version with your suggestion to re-calculate checkedTime.
>
> http://cr.openjdk.java.net/~sjiang/JDK-8050115/04/
>
> Thanks a lot for your time!
> Shanliang
>>
>> best regards,
>>
>> -- daniel
>>
>>>
>>> Thanks,
>>> Shanliang
>>>
>>> Daniel Fuchs wrote:
>>>> Hi Shanliang,
>>>>
>>>> On 9/17/14 2:19 PM, shanliang wrote:
>>>>> Daniel,
>>>>>
>>>>> The test does 2 steps of verifications, the new check is useful for
>>>>> the
>>>>> first step, and the trace in the bug showed that the test failed on
>>>>> the
>>>>> first step.
>>>>>
>>>>> Yes the check might not work for the second step, I added the new code
>>>>> for the second step to check the tested thread state and hope to have
>>>>> useful info if the test failed on the second step.
>>>>>
>>>>> Here is the new version:
>>>>> http://cr.openjdk.java.net/~sjiang/JDK-8050115/02/
>>>>>
>>>>> Thanks,
>>>>> Shanliang
>>>>
>>>> If I understand the issue correctly - the test fails in timeout
>>>> mostly on very slow machines/configurations (fastdebug with some
>>>> combinations of options).
>>>>
>>>> I worry that printing a thread dump every seconds (1000ms) is going
>>>> to make things worse: the test will spend its time printing thread
>>>> dumps instead of doing what it is supposed to do - and will have
>>>> even less CPU cycles to execute its 'real' code.
>>>>
>>>> I would have advised printing the thread dumps only at the end,
>>>> when it is detected that there might be a deadlock - except that
>>>> now we can't do that since the timeout is managed completely
>>>> by the harness (so we don't get the upper hand at the end in
>>>> case of timeout).
>>>>
>>>> I think depending on the harness to set the appropriate timeout
>>>> rather than depending on an arbitrary timeout set in the test itself
>>>> is the right thing to do. It's been a pattern in many tests that
>>>> failed in timeout intermittently on some slow machines/configuration.
>>>>
>>>> In any case - 1s seems really too frequent.
>>>> I suppose you could inspect the system properties set by the harness
>>>> (timeout + timeout factor) to devise an acceptable frequency for
>>>> your checks - if you really want to print this info.
>>>>
>>>> From the log I see that the timeout factor passed to the harness
>>>> for the slow configuration that failed is
>>>> -Dtest.timeout.factor=8.0
>>>> There's no explicit timeout given - and jtreg -onlineHelp reveals
>>>> that in this case the default timeout is two minutes.
>>>>
>>>> This means that the harness has allocated 2*8=16mins for the test to
>>>> execute.
>>>> I don't think you want to take the risk of printing a thread dump
>>>> every seconds during 16 minutes ;-)
>>>>
>>>> Of course I'm over simplifying here. Before your changes - the test
>>>> was deciding after 46.893 seconds that there must be a deadlock.
>>>> 47s is obviously way too short for a possibly slow machine running
>>>> the test in fastdebug mode.
>>>>
>>>> Something like the following might be more reasonable:
>>>>
>>>> // default timeout factor is 1.0
>>>> double factor =
>>>> Double.parseDouble(
>>>> System.getProperty("test.timeout.factor", "1.0"));
>>>> // default timeout is 2mins = 120s.
>>>> double timeout = Double.parseDouble(
>>>> System.getProperty("test.timeout", "120"));
>>>>
>>>> // total time is timeout * timeout factor * 1000 ms
>>>> long total = (long) factor * timeout * 1000;
>>>>
>>>> // Don't print thread dumps too often.
>>>> // every 5s for a total timeout of 120s seems reasonable.
>>>> // 120s/5s = 24; we will lengthen the delay if the total
>>>> // timeout is greater than 120s, so we're taking the max between
>>>> // 5s and total/24
>>>> long delayBetweenThreadDumps = Math.max(5000, total/24);
>>>>
>>>> Of course 5s and total/24 are just arbitrary...
>>>> But 24 full thread dumps in a log for a single test is enough data
>>>> to analyze I think ;-)
>>>>
>>>> best regards,
>>>>
>>>> -- daniel
>>>>
>>>>>
>>>>>
>>>>> Daniel Fuchs wrote:
>>>>>> On 9/17/14 10:55 AM, shanliang wrote:
>>>>>>> David Holmes wrote:
>>>>>>>> On 17/09/2014 7:01 AM, shanliang wrote:
>>>>>>>>> David Holmes wrote:
>>>>>>>>>> Hi Shanliang,
>>>>>>>>>>
>>>>>>>>>> On 16/09/2014 7:12 PM, shanliang wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Please review the following fix:
>>>>>>>>>>
>>>>>>>>>> I don't see any functional change. You seem to have replaced a
>>>>>>>>>> built-in timeout with the externally applied test harness
>>>>>>>>>> timeout.
>>>>>>>>> Yes no functional change here, we thought that the test needed
>>>>>>>>> more
>>>>>>>>> time
>>>>>>>>> to wait a change if a testing VM or machine was really slow, the
>>>>>>>>> test
>>>>>>>>> harness timeout was the maximum time we could give the test.
>>>>>>>>
>>>>>>>> Do we have confidence that the harness timeout is sufficient to
>>>>>>>> handle
>>>>>>>> the intermittent failures?
>>>>>>> Really a good question :)
>>>>>>>
>>>>>>> Here is new version:
>>>>>>> http://cr.openjdk.java.net/~sjiang/JDK-8050115/01/
>>>>>>>
>>>>>>> I added a deadlocked check in every 1 second, hope to get more
>>>>>>> info in
>>>>>>> case of failure.
>>>>>>
>>>>>> The following comment seems to imply that this check is not
>>>>>> very useful:
>>>>>>
>>>>>> 112 // This won't show up as a deadlock in CTRL-\ or in
>>>>>> 113 // ThreadMXBean.findDeadlockedThreads(), because
>>>>>> they
>>>>>> don't
>>>>>> 114 // see that thread A is waiting for thread B
>>>>>> (B.join()), and
>>>>>> 115 // thread B is waiting for a lock held by thread A
>>>>>>
>>>>>> best regards,
>>>>>>
>>>>>> -- daniel
>>>>>>
>>>>>>>
>>>>>>> I changed also the sleep time to 100ms, 10ms seems too short as
>>>>>>> Daniel
>>>>>>> pointed out.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Shanliang
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Style nit: add a space after 'while' -> while (cond) {
>>>>>>>>> OK, I will do it before pushing.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Shanliang
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8050115
>>>>>>>>>>> webrev: http://cr.openjdk.java.net/~sjiang/JDK-8050115/00/
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Shanliang
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the serviceability-dev
mailing list