RFR 8203174: [Graal] JDI tests fail with Unexpected exception: com.sun.jdi.ObjectCollectedException

Chris Plummer chris.plummer at oracle.com
Fri Nov 9 01:11:51 UTC 2018


Hi Daniil,

On 11/8/18 3:12 PM, Daniil Titov wrote:
> Hi Chris,
>
> You are right. There are different cases here. Please find them listed below.
>
> 1. Test vmTestbase/nsk/jdi/stress/MonitorEvents/MonitorEvents002
> 	This test was actually fixed with JDK-8206086 changes. As far as I remember it was not removed from ProblemList-graal.txt due to some sporadic failures (not related to ObjectCollectedException), however with the latest  builds these failures are no longer reproduced .
OK.
>
> 2. "NewInstance" tests (test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java, test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance002.java, test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance003.java)
>
> The typical failure for these tests Is  that the test creates a new array with arrayType.newInstance(arraylength) but this object becomes immediately GC'ed. Calling disableCollection() on the newly created instance doesn't help since by that time the object might be already GC'ed.
>
>
>                com.sun.jdi.ObjectCollectedException at jdk.jdi/com.sun.tools.jdi.JDWPException.toJDIException(JDWPException.java:55)
> 	at jdk.jdi/com.sun.tools.jdi.ArrayReferenceImpl.getValues(ArrayReferenceImpl.java:126)
> 	at jdk.jdi/com.sun.tools.jdi.ArrayReferenceImpl.getValue(ArrayReferenceImpl.java:77)
> 	at nsk.jdi.ArrayType.newInstance.newinstance001.runThis(newinstance001.java:316)
> 	at nsk.jdi.ArrayType.newInstance.newinstance001.run(newinstance001.java:84)
> 	at nsk.jdi.ArrayType.newInstance.newinstance001.main(newinstance001.java:79)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
> 	at PropertyResolvingWrapper.main(PropertyResolvingWrapper.java:104)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
> 	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
> 	at java.base/java.lang.Thread.run(Thread.java:835)
>
> The proposed fix for this case is to suspend VM while the body of the "for" loop (lines 150 - 356) is executed while ensuring that for line 151 ( line = pipe.printlln()) VM is temporary resumed ( line numbers are given for an original version of  test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance001.java)
>   
>   147     //------------------------------------------------------  testing section
>   148         log1("      TESTING BEGINS");
>   149
>   150         for (int i = 0; ; i++) {
>   151         pipe.println("newcheck");
>   152             line = pipe.readln();
>   153
>   	<skipped>		
>   
> 356         }
>   357         log1("      TESTING ENDS");
>   358
Ok. I get it now. It was confusing because there is a resume before the 
suspend, but I see now why it was done that way.  I assume you 
originally tried just doing the suspend outside of the loop, but found 
that caused problems for pipe.readln(), so you moved it to inside the 
loop and after the pipe.readln().
>
> 3. Test test/hotspot/jtreg/vmTestbase/nsk/jdi/ReferenceType/instances/instances002/instances002.java
> 	 There were 2 issues here:
>                    a) The same issue as above.  On line 125 calling disableCollection ()  on newly created array object (arrayType.newInstance()) resulted in ObjectCollectedException was thrown
>                 	The fix here is to suspend VM while the testing code is executed.
> 	   b) IndexOutOfBoundException  on line 130 (checkDebugeeAnswer_instances(className, createInstanceCount + baseInstances)  since some of base instances that were created before VM was suspended  become collected if GC was triggered before VM was suspended.
> 		The fix here is to call disableCollection() an all base instances and filter out ones that were already collected.
Ok.
>
> 4. Test test/hotspot/jtreg/vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java
> 	The problem here was ObjectCollectedException thrown on line 141 while iterating over newly created instances  since these instances also include ones created by other java threads and some of them become GC'ed before disableCollection() is called on them. In this test the new instances are created by debuggee itself (the test sends the commands to debuggee to create or delete instances using pipe)  so here we cannot use suspend VM approach.
> 	The fix is to filter out already collected instances  ( these are the instances created outside of the test) and ensure that instances created by the test are not GC'ed before we call disableCollection() on them. For the latter the fix just extends the approach used by the test for WEAK references ( to ask debuggee to create a temporary strong reference and then delete them after we call disableCollection() ) to other reference types.
Ok.
> 	
> 5. Stress tests vmTestbase/nsk/jdi/stress/serial/heapwalking001/TestDescription.java, vmTestbase/nsk/jdi/stress/serial/heapwalking002/TestDescription.java, vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java
>     Two problems here:
>     1)  These tests use primitive arrays as test classes and these classes might be created by other java threads outside the test . Changes in useStrictCheck() fix this.
>     2)  These tests include test/hotspot/jtreg/vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java test that is fixed at 4 above
Ok.

Thanks for describing the separate issues. Can you make sure the CR is 
updated with this info?

thanks,

Chris
>
>
> Thanks,
> Daniil
>
> On 11/8/18, 1:10 PM, "Chris Plummer" <chris.plummer at oracle.com> wrote:
>
>      Hi Daniil,
>      
>      Hi Daniil,
>      
>      I was wondering why instance003 did not include the same fix as
>      instance002. In fact I was wondering why all the tests did not include
>      this fix. It looks like you have now fixed instance003 the same way as
>      instance002, but I'm unsure of how all the other tests are being fixed.
>      
>      What is the fix now for MonitorEvents002, heapwalking001,
>      heapwalking002, and mixed002. Are these all covered by the
>      useStrictCheck() fix?
>      
>      For the newinstance tests, I understand what you are saying about
>      needing to resume the VM to read the response, but I'm not sure how this
>      manifested itself as a problem before the fix and how it relates to the
>      reported failures, since as far as I can tell you did not add a suspend,
>      so it must have been in place already.
>      
>      Maybe what's going on here is that there are actually 3 separate issues,
>      which is making it hard to figure out what the failure mode is for each
>      test, and how the changes are addressing it.
>      
>      thanks,
>      
>      Chris
>      
>      On 11/7/18 5:11 PM, Daniil Titov wrote:
>      > Hi Chris,
>      >
>      > Thank you for your comments.
>      >
>      > Excluding of  java.lang.String is not really required. It seems as it is sufficient just to expect that the object created by other threads and included in the result of referenceType.instances() and might be collected before we are able to call disableCollection() on them. The new version of the patch includes such change in  vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java .
>      
>      
>      
>      
>      >
>      > The changes in useStrictCheck() are required since some tests use array of primitives aa a test classes ( e.g. boolean[] ) and objects of these classes might be created by other (e.g. JVMC compiler) threads.
>      > The useStrictCheck()   defines how the number of instances the test created should be compared to the actual number of instance of this class in the target VM. If useStrictCheck() returns true these numbers should be equal, otherwise the test passes if the number of the instances the test created is no greater than the number of  instances found.
>      >
>      > The changes below are required since  we need to suspend the target VM while the test is executed (to ensure no GC is triggered). At the same time we need to ensure the target VM is temporary resumed when we read its output from the pipe. These tests (test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance00{1,2,3}.java) are written in a such way that the test code is executed as a body of the loop and on each iteration it reads a response from the target VM and then runs the tests, so on the first iteration vm.resume() is not required at the beginning of the loop. I added comments through the code and also updated existing comments per your suggestions.
>      >
>      >        153             if (i > 0) {
>      >        154                 debuggee.resume();
>      >        155             }
>      >        156
>      >        157             line = pipe.readln();
>      >        158             debuggee.suspend();
>      >
>      >
>      > Please review a new version of the fix with the changes mentioned above.
>      >
>      > Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.03/
>      > Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
>      >
>      > Thanks,
>      > Daniil
>      >
>      > On 11/7/18, 11:33 AM, "Chris Plummer" <chris.plummer at oracle.com> wrote:
>      >
>      >      Hi Danil,
>      >
>      >      So this is the crux of the issue:
>      >
>      >        112         debuggee.suspend();
>      >        113
>      >        114         List<ObjectReference> baseReferences = new LinkedList<>();
>      >        115         // We need to ensure that the base references count is not
>      >      changed during the test.
>      >        116         // The instances of the class could be already created by
>      >      other (e.g. JVMCI) threads
>      >        117         // and if GC was scheduled before VM was suspended some of
>      >      these instances might
>      >        118         // become collected.
>      >        119         for (ObjectReference objRef : referenceType.instances(0)) {
>      >        120             try {
>      >        121                 objRef.disableCollection();
>      >        122                 baseReferences.add(objRef);
>      >        123             } catch (ObjectCollectedException e) {
>      >        124                 // skip this reference
>      >        125             }
>      >        126         }
>      >        127         baseInstances = baseReferences.size();
>      >
>      >      First it is possible for a GC to be triggered even if the debuggee is
>      >      not executing any code because of JVMCI threads. So this is why
>      >      debuggee.suspend() is needed. Second, even after calling
>      >      debuggee.suspend(), it is still possible for a GC to happen since it may
>      >      have been triggered before the debuggee.suspend() call, but not yet
>      >      completed, and debuggee.suspend() does not prevent GC from completing in
>      >      that case. So that is why you need to disableCollecion() on each object,
>      >      and accept that some of them may have already been collected. Therefore
>      >      baseInstances needs to be updated to only reflect the count of instances
>      >      that have not been collected, and will not be collected because
>      >      disableCollection() has been called on them. This all makes sense and
>      >      seems fine.
>      >
>      >      I do think the comments could use some updating. The debuggee.suspend()
>      >      call should be explained as being needed because there are potentially
>      >      other non-test java threads allocating objects and triggering GC's,
>      >      JVMCI being the main culprit here. The comment before the loop should
>      >      focus on dealing with a GC that was triggered before the suspend,
>      >      requiring disableCollection() be called each object returned by
>      >      referenceType.instances() since it can potentially be collected
>      >      otherwise. This code is not really related to the JVMCI threads.
>      >
>      >      I don't understand the reason for excluding java.lang.String from testing.
>      >
>      >      I don't understand the reason for the useStrictCheck() changes.
>      >
>      >      I don't understand what the following is fixing, and the impact on test
>      >      execution that the resume() might have:
>      >
>      >        153             if (i > 0) {
>      >        154                 debuggee.resume();
>      >        155             }
>      >        156
>      >        157             line = pipe.readln();
>      >        158             debuggee.suspend();
>      >
>      >
>      >      thanks,
>      >
>      >      Chris
>      >
>      >      On 11/7/18 8:58 AM, Daniil Titov wrote:
>      >      > Hi Chris and Serguei,
>      >      >
>      >      > With recent builds I can no longer see any GC triggered by C1 compiler thread due to running out of metaspace and JDK-8193126 seems as solved this particular problem. Currently all observed GCs are triggered by JVMCI Compiler threads.
>      >      >
>      >      > Please review a new version of the fix that no longer keeps the main thread in the target VM running while other threads are suspended (since as Dean mentioned it might be not safe). Instead, the target VM is fully suspended when required and resumed afterwards. The new webrev also excludes java.lang.String class from the list of classes used by some of these tests.
>      >      >
>      >      > Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.02/index.html
>      >      > Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
>      >      >
>      >      > Thanks,
>      >      > Daniil
>      >      >
>      >      > On 11/5/18, 10:46 AM, "Chris Plummer" <chris.plummer at oracle.com> wrote:
>      >      >
>      >      >      On 11/4/18 11:45 PM, serguei.spitsyn at oracle.com wrote:
>      >      >      > On 11/4/18 23:40, serguei.spitsyn at oracle.com wrote:
>      >      >      >> Hi Daniil,
>      >      >      >>
>      >      >      >> I agree with Chris below.
>      >      >      >> Will tell more on reply to your reply.
>      >      >      >>
>      >      >      >> Thanks,
>      >      >      >> Serguei
>      >      >      >>
>      >      >      >> On 11/2/18 20:59, Chris Plummer wrote:
>      >      >      >>> Hi Daniil,
>      >      >      >>>
>      >      >      >>> I thought the issue was that C2 was doing metadata allocations, and
>      >      >      >>> when it ran out of metaspace it did a GC (one that was not triggered
>      >      >      >>> by a failed object
>      >      >      >
>      >      >      > Forgot to comment on the below.
>      >      >      > It is probably a typo.
>      >      >      > Should it be about the Graal, not the C2?
>      >      >      > As I understand we have no issue with the C2.
>      >      >      Actually it was the C1 Compiler Thread that was the problem, although it
>      >      >      only turned up during Graal testing.
>      >      >
>      >      >      Chris
>      >      >      >
>      >      >      > Thanks,
>      >      >      > Serguei
>      >      >      >
>      >      >      >
>      >      >      >>> allocation). As a results we were getting objects GCs before the
>      >      >      >>> test ever got the chance to disable collection on them. I thought we
>      >      >      >>> also concluded other than this metaspace issue, if the test is
>      >      >      >>> properly disabling collection on the objects it cared about, it
>      >      >      >>> shouldn't matter if there are GC's triggered by excessive object
>      >      >      >>> allocations.
>      >      >      >>>
>      >      >      >>> I don't think the following check will always be valid. It may be on
>      >      >      >>> by default at some point:
>      >      >      >>>
>      >      >      >>>  651     public boolean isJVMCICompilerOn() {
>      >      >      >>>  652         String opts = argumentHandler.getLaunchOptions();
>      >      >      >>>  653         return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
>      >      >      >>>  654     }
>      >      >      >>>
>      >      >      >>> thanks,
>      >      >      >>>
>      >      >      >>> Chris
>      >      >      >>>
>      >      >      >>> On 11/2/18 4:29 PM, Daniil Titov wrote:
>      >      >      >>>> Please review the change that fixes several tests failing with
>      >      >      >>>> com.sun.jdi.ObjectCollectedException when running with Graal.
>      >      >      >>>>
>      >      >      >>>> There main problem here is that with Graal on JVMCI Compiler
>      >      >      >>>> threads in the target VM create a lot of objects and sporadically
>      >      >      >>>> trigger GC that results in the objects created in the target VM in
>      >      >      >>>> the tests being GCed before the tests complete. The other problem
>      >      >      >>>> is that for some classes the tests use, e.g. "java.lang.String",
>      >      >      >>>> there is already a huge number of instances created by JVMCI threads.
>      >      >      >>>>
>      >      >      >>>> The fix suspends target VM to prevent JVMCI compiler threads from
>      >      >      >>>> creating new objects during the tests execution. In cases when
>      >      >      >>>> IOPipe is used for communication between the test and the debuggee
>      >      >      >>>> the fix suspends all threads but the main rather than suspending
>      >      >      >>>> the VM. The fix also filters out the checks for some test classes
>      >      >      >>>> (e.g. "java.lang.String") in cases when the target VM is run with
>      >      >      >>>> JVMCI compiler options on.
>      >      >      >>>>
>      >      >      >>>> Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
>      >      >      >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
>      >      >      >>>>
>      >      >      >>>> Thanks,
>      >      >      >>>> Daniil
>      >      >      >>>>
>      >      >      >>>>
>      >      >      >>>>
>      >      >      >>>
>      >      >      >>>
>      >      >      >>
>      >      >      >
>      >      >
>      >      >
>      >      >
>      >      >
>      >      >
>      >
>      >
>      >
>      >
>      >
>      
>      
>      
>
>




More information about the serviceability-dev mailing list