RFR(s) 8153711: [REDO] JDWP: Memory Leak: GlobalRefs never deleted when processing invokeMethod command

serguei.spitsyn at oracle.com serguei.spitsyn at oracle.com
Mon May 9 00:25:47 UTC 2016


On 5/8/16 03:58, serguei.spitsyn at oracle.com wrote:
> Severin,
>
> The JPRT job failed: 2016-05-08-100525.sspitsyn.8153711 with a 
> NullPointerException.
> The log link is (you can also find the most relevant fragment of the 
> log in the bug report):
> http://scaaa637.us.oracle.com//archive/2016/05/2016-05-08-100525.sspitsyn.8153711//logs/solaris_x64_5.11-fastdebug-c2-jdk_svc_sanity.log

Sorry for providing the link that is not accessible for you.
But, please, find a fragment of the failure log in my bug report comment.


Thanks,
Serguei

>
> Please, find the email notification in the attachment.
>
> Thanks,
> Serguei
>
>
> On 5/8/16 00:59, serguei.spitsyn at oracle.com wrote:
>> Hi Severin,
>>
>> I filed two new bugs to cover the discovered issues:
>>  8156498: more places in the invoke.c that need protection with the 
>> invokerLock
>> 8156500: deadlock provoked by new stress test 
>> com/sun/jdi/OomDebugTest.java
>>
>>
>> Will try to push your two fixes today or tomorrow.
>> I know, you've just got a committer status.
>> But I'm not sure you are comfortable to push the fixes yourself at 
>> this time.
>>
>> Thanks,
>> Serguei
>>
>>
>>
>> On 5/3/16 03:21, serguei.spitsyn at oracle.com wrote:
>>> Hi Severin,
>>>
>>> Please, find my comments below.
>>>
>>>
>>> On 5/2/16 01:44, Severin Gehwolf wrote:
>>>> On Fri, 2016-04-29 at 12:33 -0700, serguei.spitsyn at oracle.com wrote:
>>>>> On 4/29/16 01:56, Severin Gehwolf wrote:
>>>>>> Hi Serguei,
>>>>>>
>>>>>> On Fri, 2016-04-29 at 01:34 -0700, serguei.spitsyn at oracle.com wrote:
>>>>>>> Hi Severin,
>>>>>>>
>>>>>>> The fix looks good in general.
>>>>>>> I'm testing both fixes together at the moment.
>>>>>> That is JDK-8154529 and JDK-8153711? Yes, that's what I've done too.
>>>>>>
>>>>>>> A couple of questions...
>>>>>>>
>>>>>>> It seems, there are more places where an invokerLock critical 
>>>>>>> section is missed.
>>>>>> Right.
>>>>>>
>>>>>>> The following functions:
>>>>>>>    - invoker_enableInvokeRequests
>>>>>> This should be fixed with the patch for JDK-8154529
>>>>>>
>>>>>>>    - invokeConstructor
>>>>>>>    - invokeStatic
>>>>>>>    - invokeNonvirtual
>>>>>>>    - invokeVirtual
>>>>>>>    - saveGlobalRef
>>>>>> Correct. The intent would be to fix the callsites of 
>>>>>> saveGlobalRef. If
>>>>>> we fix invoke* then saveGlobalRef should not be an issue. I 
>>>>>> didn't want
>>>>>> to include this in this fix since it's pretty hairy and would 
>>>>>> like to
>>>>>> fix this in incremental steps.
>>>>>>
>>>>>>> The first function is easy to fix.
>>>>>>> The last 5 functions are called from the invoker_doInvoke() that 
>>>>>>> we already had a problem with.
>>>>>>> I'm puzzled with the question: How to synchronize and avoid 
>>>>>>> deadlocks at the same time?
>>>>>> I'm not sure yet. Perhaps locking only while saveGlobalRef is being
>>>>>> called in invoke* functions will help.
>>>>>>
>>>>>>> I'm inclined to let your fix go (if the testing is Ok) and file 
>>>>>>> one more bug on the remaining sync issues.
>>>>>> Please keep me in the loop about your test results.
>>>>>   Both the JTreg com/sun/jdi and the co-located nsk.jdi tests are 
>>>>> all passed.
>>>>>
>>>>> I also ran the 4 previously failed tests in big loops of 1000 runs:
>>>>>      com/sun/jdi/InvokeTest.java
>>>>>      com/sun/jdi/InvokeHangTest.java
>>>>>      com/sun/jdi/InterfaceMethodsTest.java
>>>>>      com/sun/jdi/OomDebugTest.java  (new test introduced in the 
>>>>> webrev)
>>>> I suggest to run InvokeTest, InvokeHangTest and 
>>>> InterfaceMethodsTest in
>>>> a loop. Those never failed for me in such a scenario.
>>>>
>>>>> The OomDebugTest.java failed with a timeout (most likely, a 
>>>>> deadlock).
>>>>> Please, find the OomDebugTest.jtr file in attachments.
>>>> Correct. This is what I was seeing. See the last comment of the bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8153711
>>>
>>> Need to check if it is the same as I'm seeing.
>>>
>>>>
>>>> It has the jstack output of the hung OomDebugTestTarg JVM. I'm not
>>>> convinced this is the same failure we were seeing in JDK-4858370 since
>>>> the stack suggests it's doing a GC upon a newInstance of a primitive
>>>> array. It also does not seem like the same issue as the deadlocks
>>>> exposed by locking during an invoke, because it was reproducible 
>>>> fairly
>>>> consistently with InvokeHangTest.
>>>
>>> Agreed.
>>>
>>>
>>>>
>>>> It looks to me like a new issue. Probably one which was there before,
>>>> but is only exposed by the new test. The new test stress-tests the GC
>>>> with a debugger attached. Of course, this is going to be hard to prove
>>>> since it would just run out of memory prior the patch.
>>>
>>> Thanks.
>>> Your analysis seems to be reasonable.
>>> I tend to agree with it but need more time to convince myself.
>>>
>>>
>>>>   Thoughts?
>>>
>>> Let me double check the above.
>>> If the analysis is correct then I'd suggest to file new bug and push 
>>> your fix as is.
>>>
>>>
>>> Thanks,
>>> Serguei
>>>
>>>
>>>>
>>>> Cheers,
>>>> Severin
>>>>
>>>>
>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>
>>>>>
>>>>>
>>>>>> Thanks for your help!
>>>>>>
>>>>>> Cheers,
>>>>>> Severin
>>>>>>
>>>>>>> Thanks,
>>>>>>> Serguei
>>>>>>>
>>>>>>>
>>>>>>> On 4/28/16 02:00, Severin Gehwolf wrote:
>>>>>>>> On Tue, 2016-04-19 at 19:32 -0700, serguei.spitsyn at oracle.com 
>>>>>>>> wrote:
>>>>>>>>>> Hi Severin,
>>>>>>>>>>
>>>>>>>>>> I postpone a push for this fix.
>>>>>>>>>>
>>>>>>>>>> There are two nsk.jdi test failures (they look like deadlocks):
>>>>>>>>>> nsk/jdi/ObjectReference/invokeMethod/invokemethod012 
>>>>>>>>>> FAIL(TIMEOUT)
>>>>>>>>>>    nsk/jdi/Scenarios/invokeMethod/popframes001 FAIL(TIMEOUT)
>>>>>>>>>>
>>>>>>>>>> and one jtreg sun/com/jdi failure (it looks like a deadlock 
>>>>>>>>>> too):
>>>>>>>>>>    com/sun/jdi/InvokeHangTes.java
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'll double check if these failures are regressions caused by 
>>>>>>>>>> your fix
>>>>>>>>>> or not.
>>>>>>>>> I confirm, the failures above are new regressions introduced 
>>>>>>>>> by the fix.
>>>>>>>>> The tests fail consistently with the fix and do not fail 
>>>>>>>>> without the fix.
>>>>>>>> OK this was caused by the locking done in invoker_doInvoke(). 
>>>>>>>> Note that
>>>>>>>> holding either of them invoker lock or event handler lock 
>>>>>>>> causes this.
>>>>>>>>
>>>>>>>> Here is a new webrev with those hunks removed. It's sufficient 
>>>>>>>> to grab
>>>>>>>> the locks again in invoke_completeInvokeRequest() when clearing 
>>>>>>>> the
>>>>>>>> global references in order to not get those failures we've seen 
>>>>>>>> when
>>>>>>>> the fix for JDK-4858370 was in place.
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8153711/webrev.02/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> Testing done:
>>>>>>>>   - com/sun/jdi/InvokeTest.java com/sun/jdi/InvokeHangTest.java 
>>>>>>>> and
>>>>>>>>     sun/jdi/InterfaceMethodsTest.java does not fail in 1500 runs.
>>>>>>>>   - regular com/sun/jdi test set: no regressions
>>>>>>>>
>>>>>>>> Note that there are some rare cases where OomDebugTest times 
>>>>>>>> out which
>>>>>>>> seems to be caused by the GC, though. See the bug for details. 
>>>>>>>> Since
>>>>>>>> this problem is rare, it's still worthwhile having that test 
>>>>>>>> included.
>>>>>>>> If it turns out a problem in practice we could consider 
>>>>>>>> disabling the
>>>>>>>> test.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Severin
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20160508/428193e6/attachment-0001.html>


More information about the serviceability-dev mailing list