RFR(s) 8153711: [REDO] JDWP: Memory Leak: GlobalRefs never deleted when processing invokeMethod command

serguei.spitsyn at oracle.com serguei.spitsyn at oracle.com
Sun May 8 10:58:44 UTC 2016


Severin,

The JPRT job failed: 2016-05-08-100525.sspitsyn.8153711 with a 
NullPointerException.
The log link is (you can also find the most relevant fragment of the log 
in the bug report):
http://scaaa637.us.oracle.com//archive/2016/05/2016-05-08-100525.sspitsyn.8153711//logs/solaris_x64_5.11-fastdebug-c2-jdk_svc_sanity.log

Please, find the email notification in the attachment.

Thanks,
Serguei


On 5/8/16 00:59, serguei.spitsyn at oracle.com wrote:
> Hi Severin,
>
> I filed two new bugs to cover the discovered issues:
>  8156498: more places in the invoke.c that need protection with the 
> invokerLock
> 8156500: deadlock provoked by new stress test 
> com/sun/jdi/OomDebugTest.java
>
>
> Will try to push your two fixes today or tomorrow.
> I know, you've just got a committer status.
> But I'm not sure you are comfortable to push the fixes yourself at 
> this time.
>
> Thanks,
> Serguei
>
>
>
> On 5/3/16 03:21, serguei.spitsyn at oracle.com wrote:
>> Hi Severin,
>>
>> Please, find my comments below.
>>
>>
>> On 5/2/16 01:44, Severin Gehwolf wrote:
>>> On Fri, 2016-04-29 at 12:33 -0700, serguei.spitsyn at oracle.com wrote:
>>>> On 4/29/16 01:56, Severin Gehwolf wrote:
>>>>> Hi Serguei,
>>>>>
>>>>> On Fri, 2016-04-29 at 01:34 -0700, serguei.spitsyn at oracle.com wrote:
>>>>>> Hi Severin,
>>>>>>
>>>>>> The fix looks good in general.
>>>>>> I'm testing both fixes together at the moment.
>>>>> That is JDK-8154529 and JDK-8153711? Yes, that's what I've done too.
>>>>>
>>>>>> A couple of questions...
>>>>>>
>>>>>> It seems, there are more places where an invokerLock critical 
>>>>>> section is missed.
>>>>> Right.
>>>>>
>>>>>> The following functions:
>>>>>>    - invoker_enableInvokeRequests
>>>>> This should be fixed with the patch for JDK-8154529
>>>>>
>>>>>>    - invokeConstructor
>>>>>>    - invokeStatic
>>>>>>    - invokeNonvirtual
>>>>>>    - invokeVirtual
>>>>>>    - saveGlobalRef
>>>>> Correct. The intent would be to fix the callsites of 
>>>>> saveGlobalRef. If
>>>>> we fix invoke* then saveGlobalRef should not be an issue. I didn't 
>>>>> want
>>>>> to include this in this fix since it's pretty hairy and would like to
>>>>> fix this in incremental steps.
>>>>>
>>>>>> The first function is easy to fix.
>>>>>> The last 5 functions are called from the invoker_doInvoke() that 
>>>>>> we already had a problem with.
>>>>>> I'm puzzled with the question: How to synchronize and avoid 
>>>>>> deadlocks at the same time?
>>>>> I'm not sure yet. Perhaps locking only while saveGlobalRef is being
>>>>> called in invoke* functions will help.
>>>>>
>>>>>> I'm inclined to let your fix go (if the testing is Ok) and file 
>>>>>> one more bug on the remaining sync issues.
>>>>> Please keep me in the loop about your test results.
>>>>   Both the JTreg com/sun/jdi and the co-located nsk.jdi tests are 
>>>> all passed.
>>>>
>>>> I also ran the 4 previously failed tests in big loops of 1000 runs:
>>>>      com/sun/jdi/InvokeTest.java
>>>>      com/sun/jdi/InvokeHangTest.java
>>>>      com/sun/jdi/InterfaceMethodsTest.java
>>>>      com/sun/jdi/OomDebugTest.java  (new test introduced in the 
>>>> webrev)
>>> I suggest to run InvokeTest, InvokeHangTest and InterfaceMethodsTest in
>>> a loop. Those never failed for me in such a scenario.
>>>
>>>> The OomDebugTest.java failed with a timeout (most likely, a deadlock).
>>>> Please, find the OomDebugTest.jtr file in attachments.
>>> Correct. This is what I was seeing. See the last comment of the bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8153711
>>
>> Need to check if it is the same as I'm seeing.
>>
>>>
>>> It has the jstack output of the hung OomDebugTestTarg JVM. I'm not
>>> convinced this is the same failure we were seeing in JDK-4858370 since
>>> the stack suggests it's doing a GC upon a newInstance of a primitive
>>> array. It also does not seem like the same issue as the deadlocks
>>> exposed by locking during an invoke, because it was reproducible fairly
>>> consistently with InvokeHangTest.
>>
>> Agreed.
>>
>>
>>>
>>> It looks to me like a new issue. Probably one which was there before,
>>> but is only exposed by the new test. The new test stress-tests the GC
>>> with a debugger attached. Of course, this is going to be hard to prove
>>> since it would just run out of memory prior the patch.
>>
>> Thanks.
>> Your analysis seems to be reasonable.
>> I tend to agree with it but need more time to convince myself.
>>
>>
>>>   Thoughts?
>>
>> Let me double check the above.
>> If the analysis is correct then I'd suggest to file new bug and push 
>> your fix as is.
>>
>>
>> Thanks,
>> Serguei
>>
>>
>>>
>>> Cheers,
>>> Severin
>>>
>>>
>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>>
>>>>
>>>>> Thanks for your help!
>>>>>
>>>>> Cheers,
>>>>> Severin
>>>>>
>>>>>> Thanks,
>>>>>> Serguei
>>>>>>
>>>>>>
>>>>>> On 4/28/16 02:00, Severin Gehwolf wrote:
>>>>>>> On Tue, 2016-04-19 at 19:32 -0700, serguei.spitsyn at oracle.com 
>>>>>>> wrote:
>>>>>>>>> Hi Severin,
>>>>>>>>>
>>>>>>>>> I postpone a push for this fix.
>>>>>>>>>
>>>>>>>>> There are two nsk.jdi test failures (they look like deadlocks):
>>>>>>>>> nsk/jdi/ObjectReference/invokeMethod/invokemethod012 
>>>>>>>>> FAIL(TIMEOUT)
>>>>>>>>>    nsk/jdi/Scenarios/invokeMethod/popframes001 FAIL(TIMEOUT)
>>>>>>>>>
>>>>>>>>> and one jtreg sun/com/jdi failure (it looks like a deadlock too):
>>>>>>>>>    com/sun/jdi/InvokeHangTes.java
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'll double check if these failures are regressions caused by 
>>>>>>>>> your fix
>>>>>>>>> or not.
>>>>>>>> I confirm, the failures above are new regressions introduced by 
>>>>>>>> the fix.
>>>>>>>> The tests fail consistently with the fix and do not fail 
>>>>>>>> without the fix.
>>>>>>> OK this was caused by the locking done in invoker_doInvoke(). 
>>>>>>> Note that
>>>>>>> holding either of them invoker lock or event handler lock causes 
>>>>>>> this.
>>>>>>>
>>>>>>> Here is a new webrev with those hunks removed. It's sufficient 
>>>>>>> to grab
>>>>>>> the locks again in invoke_completeInvokeRequest() when clearing the
>>>>>>> global references in order to not get those failures we've seen 
>>>>>>> when
>>>>>>> the fix for JDK-4858370 was in place.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8153711/webrev.02/
>>>>>>>
>>>>>>> Testing done:
>>>>>>>   - com/sun/jdi/InvokeTest.java com/sun/jdi/InvokeHangTest.java and
>>>>>>>     sun/jdi/InterfaceMethodsTest.java does not fail in 1500 runs.
>>>>>>>   - regular com/sun/jdi test set: no regressions
>>>>>>>
>>>>>>> Note that there are some rare cases where OomDebugTest times out 
>>>>>>> which
>>>>>>> seems to be caused by the GC, though. See the bug for details. 
>>>>>>> Since
>>>>>>> this problem is rare, it's still worthwhile having that test 
>>>>>>> included.
>>>>>>> If it turns out a problem in practice we could consider 
>>>>>>> disabling the
>>>>>>> test.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Severin
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20160508/20d1ed3f/attachment-0001.html>
-------------- next part --------------
An embedded message was scrubbed...
From: jprt_admin_ww_grp at oracle.com
Subject: JPRT: [west] job notification - FAILED with job 2016-05-08-100525.sspitsyn.8153711
Date: Sun, 8 May 2016 10:33:29 GMT
Size: 83229
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20160508/20d1ed3f/AttachedMessage-0001.mht>


More information about the serviceability-dev mailing list