RFR (XS) 8220688: [TESTBUG] runtime/NMT/MallocStressTest.java timed out

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Tue May 28 11:04:17 UTC 2019


Thanks David!
Coleen

On 5/28/19 6:56 AM, David Holmes wrote:
> Thanks for looking deeper into this Coleen. Please go ahead with what 
> you've proposed.
>
> David
>
> On 23/05/2019 9:13 pm, coleen.phillimore at oracle.com wrote:
>>
>> I've downloaded all the recent hangs from this test and couldn't find 
>> it doing anything that deadlocks, except lots of GC.  I've updated 
>> the bug with comments.   I still think taking out the sleep, that 
>> lets the allocator threads run until they OOM, will make this test 
>> more reliable.  I don't see any evidence of another bug from the logs 
>> we have.
>> thanks,
>> Coleen
>>
>> On 5/21/19 7:51 AM, coleen.phillimore at oracle.com wrote:
>>>
>>>
>>> On 5/20/19 10:25 PM, David Holmes wrote:
>>>> Hi Coleen,
>>>>
>>>> This is fine to push, but some discussion below ...
>>>>
>>>> On 21/05/2019 6:50 am, coleen.phillimore at oracle.com wrote:
>>>>> Summary: reduce number of threads and iterate rather than sleep.
>>>>
>>>> Reducing the load of this test may be reasonable ... though perhaps 
>>>> it should be configured based on available CPUs rather than fixed 
>>>> numbers (just a thought).
>>>
>>> I'm thinking even the minimal configuration can deal with 50 
>>> threads, which is why I chose it.  It's not worth doing anything 
>>> more complicated here.
>>>>
>>>> But this test rarely times out (and some failures of the test have 
>>>> been incorrectly attributed with this bug - the 10 second attach 
>>>> timeout is not at all the same issue as timing out after 1h25m!), 
>>>> so your testing is unlikely to have encountered the conditions in 
>>>> which the timeout actually manifests. The test normally runs in a 
>>>> few minutes on linux so the 1h25m timeout is extreme and seems 
>>>> unlikely to be due simply to the load the test produces. So I would 
>>>> not be surprised if we continue to see occasional timeouts.
>>>
>>> I don't know about the attach timeouts which I've seen in lots of 
>>> tests.  This test uses jcmd to read the NMT info.  There are several 
>>> tests that do that in the test suite, but this particular test times 
>>> out *a lot*.   Looking at the test, the load is very high and 
>>> there's a long sleep, so a variable attach response will trigger a 
>>> timeout for this particular test.  That is why I want to reduce the 
>>> load and eliminate the long sleep.  You can always open a new bug 
>>> for the attach mechanism variation, if one doesn't already exist.
>>>
>>> Thanks,
>>> Coleen
>>>>
>>>> Further, the huge variation in execution time for this test across 
>>>> different platforms e.g. macOS takes 45 minutes!, suggests there 
>>>> may be something else at play here.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Ran test 20x without timeout and faster now.  Left /timeout in 
>>>>> test because it doesn't hurt.
>>>>>
>>>>> open webrev at 
>>>>> http://cr.openjdk.java.net/~coleenp/2019/8220688.01/webrev
>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8220688
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>
>>



More information about the hotspot-runtime-dev mailing list