RFR (XS) 8220688: [TESTBUG] runtime/NMT/MallocStressTest.java timed out

David Holmes david.holmes at oracle.com
Tue May 28 10:56:47 UTC 2019


Thanks for looking deeper into this Coleen. Please go ahead with what 
you've proposed.

David

On 23/05/2019 9:13 pm, coleen.phillimore at oracle.com wrote:
> 
> I've downloaded all the recent hangs from this test and couldn't find it 
> doing anything that deadlocks, except lots of GC.  I've updated the bug 
> with comments.   I still think taking out the sleep, that lets the 
> allocator threads run until they OOM, will make this test more 
> reliable.  I don't see any evidence of another bug from the logs we have.
> thanks,
> Coleen
> 
> On 5/21/19 7:51 AM, coleen.phillimore at oracle.com wrote:
>>
>>
>> On 5/20/19 10:25 PM, David Holmes wrote:
>>> Hi Coleen,
>>>
>>> This is fine to push, but some discussion below ...
>>>
>>> On 21/05/2019 6:50 am, coleen.phillimore at oracle.com wrote:
>>>> Summary: reduce number of threads and iterate rather than sleep.
>>>
>>> Reducing the load of this test may be reasonable ... though perhaps 
>>> it should be configured based on available CPUs rather than fixed 
>>> numbers (just a thought).
>>
>> I'm thinking even the minimal configuration can deal with 50 threads, 
>> which is why I chose it.  It's not worth doing anything more 
>> complicated here.
>>>
>>> But this test rarely times out (and some failures of the test have 
>>> been incorrectly attributed with this bug - the 10 second attach 
>>> timeout is not at all the same issue as timing out after 1h25m!), so 
>>> your testing is unlikely to have encountered the conditions in which 
>>> the timeout actually manifests. The test normally runs in a few 
>>> minutes on linux so the 1h25m timeout is extreme and seems unlikely 
>>> to be due simply to the load the test produces. So I would not be 
>>> surprised if we continue to see occasional timeouts.
>>
>> I don't know about the attach timeouts which I've seen in lots of 
>> tests.  This test uses jcmd to read the NMT info.  There are several 
>> tests that do that in the test suite, but this particular test times 
>> out *a lot*.   Looking at the test, the load is very high and there's 
>> a long sleep, so a variable attach response will trigger a timeout for 
>> this particular test.  That is why I want to reduce the load and 
>> eliminate the long sleep.  You can always open a new bug for the 
>> attach mechanism variation, if one doesn't already exist.
>>
>> Thanks,
>> Coleen
>>>
>>> Further, the huge variation in execution time for this test across 
>>> different platforms e.g. macOS takes 45 minutes!, suggests there may 
>>> be something else at play here.
>>>
>>> Thanks,
>>> David
>>>
>>>> Ran test 20x without timeout and faster now.  Left /timeout in test 
>>>> because it doesn't hurt.
>>>>
>>>> open webrev at 
>>>> http://cr.openjdk.java.net/~coleenp/2019/8220688.01/webrev
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8220688
>>>>
>>>> Thanks,
>>>> Coleen
>>
> 


More information about the hotspot-runtime-dev mailing list