RFR 8203174: [Graal] JDI tests fail with Unexpected exception: com.sun.jdi.ObjectCollectedException

Daniil Titov daniil.x.titov at oracle.com
Thu Nov 8 01:11:36 UTC 2018


Hi Chris,

Thank you for your comments.

Excluding of  java.lang.String is not really required. It seems as it is sufficient just to expect that the object created by other threads and included in the result of referenceType.instances() and might be collected before we are able to call disableCollection() on them. The new version of the patch includes such change in  vmTestbase/nsk/jdi/ReferenceType/instances/instances003/instances003.java .

The changes in useStrictCheck() are required since some tests use array of primitives aa a test classes ( e.g. boolean[] ) and objects of these classes might be created by other (e.g. JVMC compiler) threads.
The useStrictCheck()   defines how the number of instances the test created should be compared to the actual number of instance of this class in the target VM. If useStrictCheck() returns true these numbers should be equal, otherwise the test passes if the number of the instances the test created is no greater than the number of  instances found.

The changes below are required since  we need to suspend the target VM while the test is executed (to ensure no GC is triggered). At the same time we need to ensure the target VM is temporary resumed when we read its output from the pipe. These tests (test/hotspot/jtreg/vmTestbase/nsk/jdi/ArrayType/newInstance/newinstance00{1,2,3}.java) are written in a such way that the test code is executed as a body of the loop and on each iteration it reads a response from the target VM and then runs the tests, so on the first iteration vm.resume() is not required at the beginning of the loop. I added comments through the code and also updated existing comments per your suggestions.

      153             if (i > 0) {
      154                 debuggee.resume();
      155             }
      156
      157             line = pipe.readln();
      158             debuggee.suspend();
    

Please review a new version of the fix with the changes mentioned above.

Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.03/
Bug: https://bugs.openjdk.java.net/browse/JDK-8203174 

Thanks,
Daniil

On 11/7/18, 11:33 AM, "Chris Plummer" <chris.plummer at oracle.com> wrote:

    Hi Danil,
    
    So this is the crux of the issue:
    
      112         debuggee.suspend();
      113
      114         List<ObjectReference> baseReferences = new LinkedList<>();
      115         // We need to ensure that the base references count is not 
    changed during the test.
      116         // The instances of the class could be already created by 
    other (e.g. JVMCI) threads
      117         // and if GC was scheduled before VM was suspended some of 
    these instances might
      118         // become collected.
      119         for (ObjectReference objRef : referenceType.instances(0)) {
      120             try {
      121                 objRef.disableCollection();
      122                 baseReferences.add(objRef);
      123             } catch (ObjectCollectedException e) {
      124                 // skip this reference
      125             }
      126         }
      127         baseInstances = baseReferences.size();
    
    First it is possible for a GC to be triggered even if the debuggee is 
    not executing any code because of JVMCI threads. So this is why 
    debuggee.suspend() is needed. Second, even after calling 
    debuggee.suspend(), it is still possible for a GC to happen since it may 
    have been triggered before the debuggee.suspend() call, but not yet 
    completed, and debuggee.suspend() does not prevent GC from completing in 
    that case. So that is why you need to disableCollecion() on each object, 
    and accept that some of them may have already been collected. Therefore 
    baseInstances needs to be updated to only reflect the count of instances 
    that have not been collected, and will not be collected because 
    disableCollection() has been called on them. This all makes sense and 
    seems fine.
    
    I do think the comments could use some updating. The debuggee.suspend() 
    call should be explained as being needed because there are potentially 
    other non-test java threads allocating objects and triggering GC's, 
    JVMCI being the main culprit here. The comment before the loop should 
    focus on dealing with a GC that was triggered before the suspend, 
    requiring disableCollection() be called each object returned by 
    referenceType.instances() since it can potentially be collected 
    otherwise. This code is not really related to the JVMCI threads.
    
    I don't understand the reason for excluding java.lang.String from testing.
    
    I don't understand the reason for the useStrictCheck() changes.
    
    I don't understand what the following is fixing, and the impact on test 
    execution that the resume() might have:
    
      153             if (i > 0) {
      154                 debuggee.resume();
      155             }
      156
      157             line = pipe.readln();
      158             debuggee.suspend();
    
    
    thanks,
    
    Chris
    
    On 11/7/18 8:58 AM, Daniil Titov wrote:
    > Hi Chris and Serguei,
    >
    > With recent builds I can no longer see any GC triggered by C1 compiler thread due to running out of metaspace and JDK-8193126 seems as solved this particular problem. Currently all observed GCs are triggered by JVMCI Compiler threads.
    >
    > Please review a new version of the fix that no longer keeps the main thread in the target VM running while other threads are suspended (since as Dean mentioned it might be not safe). Instead, the target VM is fully suspended when required and resumed afterwards. The new webrev also excludes java.lang.String class from the list of classes used by some of these tests.
    >
    > Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.02/index.html
    > Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
    >
    > Thanks,
    > Daniil
    >
    > On 11/5/18, 10:46 AM, "Chris Plummer" <chris.plummer at oracle.com> wrote:
    >
    >      On 11/4/18 11:45 PM, serguei.spitsyn at oracle.com wrote:
    >      > On 11/4/18 23:40, serguei.spitsyn at oracle.com wrote:
    >      >> Hi Daniil,
    >      >>
    >      >> I agree with Chris below.
    >      >> Will tell more on reply to your reply.
    >      >>
    >      >> Thanks,
    >      >> Serguei
    >      >>
    >      >> On 11/2/18 20:59, Chris Plummer wrote:
    >      >>> Hi Daniil,
    >      >>>
    >      >>> I thought the issue was that C2 was doing metadata allocations, and
    >      >>> when it ran out of metaspace it did a GC (one that was not triggered
    >      >>> by a failed object
    >      >
    >      > Forgot to comment on the below.
    >      > It is probably a typo.
    >      > Should it be about the Graal, not the C2?
    >      > As I understand we have no issue with the C2.
    >      Actually it was the C1 Compiler Thread that was the problem, although it
    >      only turned up during Graal testing.
    >      
    >      Chris
    >      >
    >      > Thanks,
    >      > Serguei
    >      >
    >      >
    >      >>> allocation). As a results we were getting objects GCs before the
    >      >>> test ever got the chance to disable collection on them. I thought we
    >      >>> also concluded other than this metaspace issue, if the test is
    >      >>> properly disabling collection on the objects it cared about, it
    >      >>> shouldn't matter if there are GC's triggered by excessive object
    >      >>> allocations.
    >      >>>
    >      >>> I don't think the following check will always be valid. It may be on
    >      >>> by default at some point:
    >      >>>
    >      >>>  651     public boolean isJVMCICompilerOn() {
    >      >>>  652         String opts = argumentHandler.getLaunchOptions();
    >      >>>  653         return opts.indexOf("-XX:+UseJVMCICompiler") >= 0;
    >      >>>  654     }
    >      >>>
    >      >>> thanks,
    >      >>>
    >      >>> Chris
    >      >>>
    >      >>> On 11/2/18 4:29 PM, Daniil Titov wrote:
    >      >>>> Please review the change that fixes several tests failing with
    >      >>>> com.sun.jdi.ObjectCollectedException when running with Graal.
    >      >>>>
    >      >>>> There main problem here is that with Graal on JVMCI Compiler
    >      >>>> threads in the target VM create a lot of objects and sporadically
    >      >>>> trigger GC that results in the objects created in the target VM in
    >      >>>> the tests being GCed before the tests complete. The other problem
    >      >>>> is that for some classes the tests use, e.g. "java.lang.String",
    >      >>>> there is already a huge number of instances created by JVMCI threads.
    >      >>>>
    >      >>>> The fix suspends target VM to prevent JVMCI compiler threads from
    >      >>>> creating new objects during the tests execution. In cases when
    >      >>>> IOPipe is used for communication between the test and the debuggee
    >      >>>> the fix suspends all threads but the main rather than suspending
    >      >>>> the VM. The fix also filters out the checks for some test classes
    >      >>>> (e.g. "java.lang.String") in cases when the target VM is run with
    >      >>>> JVMCI compiler options on.
    >      >>>>
    >      >>>> Webrev: http://cr.openjdk.java.net/~dtitov/8203174/webrev.01/
    >      >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8203174
    >      >>>>
    >      >>>> Thanks,
    >      >>>> Daniil
    >      >>>>
    >      >>>>
    >      >>>>
    >      >>>
    >      >>>
    >      >>
    >      >
    >      
    >      
    >      
    >
    >
    
    
    




More information about the serviceability-dev mailing list