RFR(s) 8153711: [REDO] JDWP: Memory Leak: GlobalRefs never deleted when processing invokeMethod command

serguei.spitsyn at oracle.com serguei.spitsyn at oracle.com
Sun May 8 07:59:19 UTC 2016


Hi Severin,

I filed two new bugs to cover the discovered issues:
  8156498: more places in the invoke.c that need protection with the 
invokerLock
8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java


Will try to push your two fixes today or tomorrow.
I know, you've just got a committer status.
But I'm not sure you are comfortable to push the fixes yourself at this 
time.

Thanks,
Serguei



On 5/3/16 03:21, serguei.spitsyn at oracle.com wrote:
> Hi Severin,
>
> Please, find my comments below.
>
>
> On 5/2/16 01:44, Severin Gehwolf wrote:
>> On Fri, 2016-04-29 at 12:33 -0700, serguei.spitsyn at oracle.com wrote:
>>> On 4/29/16 01:56, Severin Gehwolf wrote:
>>>> Hi Serguei,
>>>>
>>>> On Fri, 2016-04-29 at 01:34 -0700, serguei.spitsyn at oracle.com wrote:
>>>>> Hi Severin,
>>>>>
>>>>> The fix looks good in general.
>>>>> I'm testing both fixes together at the moment.
>>>> That is JDK-8154529 and JDK-8153711? Yes, that's what I've done too.
>>>>
>>>>> A couple of questions...
>>>>>
>>>>> It seems, there are more places where an invokerLock critical 
>>>>> section is missed.
>>>> Right.
>>>>
>>>>> The following functions:
>>>>>    - invoker_enableInvokeRequests
>>>> This should be fixed with the patch for JDK-8154529
>>>>
>>>>>    - invokeConstructor
>>>>>    - invokeStatic
>>>>>    - invokeNonvirtual
>>>>>    - invokeVirtual
>>>>>    - saveGlobalRef
>>>> Correct. The intent would be to fix the callsites of saveGlobalRef. If
>>>> we fix invoke* then saveGlobalRef should not be an issue. I didn't 
>>>> want
>>>> to include this in this fix since it's pretty hairy and would like to
>>>> fix this in incremental steps.
>>>>
>>>>> The first function is easy to fix.
>>>>> The last 5 functions are called from the invoker_doInvoke() that 
>>>>> we already had a problem with.
>>>>> I'm puzzled with the question: How to synchronize and avoid 
>>>>> deadlocks at the same time?
>>>> I'm not sure yet. Perhaps locking only while saveGlobalRef is being
>>>> called in invoke* functions will help.
>>>>
>>>>> I'm inclined to let your fix go (if the testing is Ok) and file 
>>>>> one more bug on the remaining sync issues.
>>>> Please keep me in the loop about your test results.
>>>   Both the JTreg com/sun/jdi and the co-located nsk.jdi tests are 
>>> all passed.
>>>
>>> I also ran the 4 previously failed tests in big loops of 1000 runs:
>>>      com/sun/jdi/InvokeTest.java
>>>      com/sun/jdi/InvokeHangTest.java
>>>      com/sun/jdi/InterfaceMethodsTest.java
>>>      com/sun/jdi/OomDebugTest.java  (new test introduced in the webrev)
>> I suggest to run InvokeTest, InvokeHangTest and InterfaceMethodsTest in
>> a loop. Those never failed for me in such a scenario.
>>
>>> The OomDebugTest.java failed with a timeout (most likely, a deadlock).
>>> Please, find the OomDebugTest.jtr file in attachments.
>> Correct. This is what I was seeing. See the last comment of the bug:
>> https://bugs.openjdk.java.net/browse/JDK-8153711
>
> Need to check if it is the same as I'm seeing.
>
>>
>> It has the jstack output of the hung OomDebugTestTarg JVM. I'm not
>> convinced this is the same failure we were seeing in JDK-4858370 since
>> the stack suggests it's doing a GC upon a newInstance of a primitive
>> array. It also does not seem like the same issue as the deadlocks
>> exposed by locking during an invoke, because it was reproducible fairly
>> consistently with InvokeHangTest.
>
> Agreed.
>
>
>>
>> It looks to me like a new issue. Probably one which was there before,
>> but is only exposed by the new test. The new test stress-tests the GC
>> with a debugger attached. Of course, this is going to be hard to prove
>> since it would just run out of memory prior the patch.
>
> Thanks.
> Your analysis seems to be reasonable.
> I tend to agree with it but need more time to convince myself.
>
>
>>   Thoughts?
>
> Let me double check the above.
> If the analysis is correct then I'd suggest to file new bug and push 
> your fix as is.
>
>
> Thanks,
> Serguei
>
>
>>
>> Cheers,
>> Severin
>>
>>
>>
>>> Thanks,
>>> Serguei
>>>
>>>
>>>
>>>> Thanks for your help!
>>>>
>>>> Cheers,
>>>> Severin
>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>
>>>>>
>>>>> On 4/28/16 02:00, Severin Gehwolf wrote:
>>>>>> On Tue, 2016-04-19 at 19:32 -0700, serguei.spitsyn at oracle.com wrote:
>>>>>>>> Hi Severin,
>>>>>>>>
>>>>>>>> I postpone a push for this fix.
>>>>>>>>
>>>>>>>> There are two nsk.jdi test failures (they look like deadlocks):
>>>>>>>> nsk/jdi/ObjectReference/invokeMethod/invokemethod012 FAIL(TIMEOUT)
>>>>>>>>    nsk/jdi/Scenarios/invokeMethod/popframes001 FAIL(TIMEOUT)
>>>>>>>>
>>>>>>>> and one jtreg sun/com/jdi failure (it looks like a deadlock too):
>>>>>>>>    com/sun/jdi/InvokeHangTes.java
>>>>>>>>
>>>>>>>>
>>>>>>>> I'll double check if these failures are regressions caused by 
>>>>>>>> your fix
>>>>>>>> or not.
>>>>>>> I confirm, the failures above are new regressions introduced by 
>>>>>>> the fix.
>>>>>>> The tests fail consistently with the fix and do not fail without 
>>>>>>> the fix.
>>>>>> OK this was caused by the locking done in invoker_doInvoke(). 
>>>>>> Note that
>>>>>> holding either of them invoker lock or event handler lock causes 
>>>>>> this.
>>>>>>
>>>>>> Here is a new webrev with those hunks removed. It's sufficient to 
>>>>>> grab
>>>>>> the locks again in invoke_completeInvokeRequest() when clearing the
>>>>>> global references in order to not get those failures we've seen when
>>>>>> the fix for JDK-4858370 was in place.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8153711/webrev.02/
>>>>>>
>>>>>> Testing done:
>>>>>>   - com/sun/jdi/InvokeTest.java com/sun/jdi/InvokeHangTest.java and
>>>>>>     sun/jdi/InterfaceMethodsTest.java does not fail in 1500 runs.
>>>>>>   - regular com/sun/jdi test set: no regressions
>>>>>>
>>>>>> Note that there are some rare cases where OomDebugTest times out 
>>>>>> which
>>>>>> seems to be caused by the GC, though. See the bug for details. Since
>>>>>> this problem is rare, it's still worthwhile having that test 
>>>>>> included.
>>>>>> If it turns out a problem in practice we could consider disabling 
>>>>>> the
>>>>>> test.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Cheers,
>>>>>> Severin
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20160508/0561188b/attachment.html>


More information about the serviceability-dev mailing list