3-nd round RFR (XS) 6988950: JDWP exit error JVMTI_ERROR_WRONG_PHASE(112)

Mon Nov 3 18:41:33 UTC 2014

On 10/31/14 3:07 PM, serguei.spitsyn at oracle.com wrote:
>
> It is 3-rd round of review for:
> https://bugs.openjdk.java.net/browse/JDK-6988950
>
> New webrev:
> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.3/

Thumbs up on the code. I have comment suggestions below...

src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c
     line 149: debugMonitorEnter(vmDeathLock);
         Perhaps this comment would help:
         /*
          * We grab the vmDeathLock here to prevent the cbVMDeath()
          * event handler from tearing things down while we're
          * asynchronously processing a command.
          */

src/jdk.jdwp.agent/share/native/libjdwp/eventHandler.c
     line 71: jrawMonitorID vmDeathLock;
         A nice blurb about the new global lock's protocol would be
         good here. Something like:

         /*
          * Coordinates the cbVMDeath() event handler and the
          * debugLoop_run() thread.
          */

     line 1236: debugMonitorEnter(vmDeathLock);
         Perhaps this comment would help:
         /*
          * We grab the vmDeathLock here to prevent the debugLoop_run()
          * thread from asynchronously dispatching another command.
          */

     This block caught my eye:

     1295     /*
     1296      * The VM will die soon after the completion of this 
callback - we
     1297      * may need to do a final synchronization with the command 
loop to
     1298      * avoid the VM terminating with replying to the final 
(resume)
     1299      * command.
     1300      */
     1301     debugLoop_sync();

     The above comment implies that debugLoop_sync() does something to
     coordinate this code (cbVMDeath()) with the debugLoop code, but
     clearly something is incomplete. It's entirely possible that this
     debugLoop_sync() is solving a different problem that happens after
     the debugLoop has left its command processing loop and realizes
     that the VM is shutting down.

     Don't know for sure. I haven't been in this code for quite a while...

Dan

>
>
> Summary
>
>   For failing scenario, please, refer to the 1-st round RFR below.
>
>   I've found what is missed in the jdwp agent shutdown and decided to 
> switch from a workaround to a real fix.
>
>   The agent VM_DEATH callback sets the gdata field: gdata->vmDead = 1.
>   The agent debugLoop_run() has a guard against the VM shutdown:
>   165             } else if (gdata->vmDead &&
>   166              ((cmd->cmdSet) != JDWP_COMMAND_SET(VirtualMachine))) {
>   167                 /* Protect the VM from calls while dead.
>   168                  * VirtualMachine cmdSet quietly ignores some cmds
>   169                  * after VM death, so, it sends it's own errors.
>   170                  */
>   171                 outStream_setError(&out, JDWP_ERROR(VM_DEAD));
>
>   However, the guard above does not help much if the VM_DEATH event 
> happens in the middle of a command execution.
>   There is a lack of synchronization here.
>
>   The fix introduces new lock (vmDeathLock) which does not allow to 
> execute the commands
>   and the VM_DEATH event callback concurrently.
>   It should work well for any function that is used in implementation 
> of the JDWP_COMMAND_SET(VirtualMachine) .
>
>
> Testing:
>   Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi tests
>
>
> Thanks,
> Serguei
>
>
> On 10/29/14 6:05 PM, serguei.spitsyn at oracle.com wrote:
>> The updated webrev:
>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.2/ 
>>
>>
>> The changes are:
>>   - added a comment recommended by Staffan
>>   - removed the ignore_wrong_phase() call from function classSignature()
>>
>> The classSignature() function is called in 16 places.
>> Most of them do not tolerate the NULL in place of returned signature 
>> and will crash.
>> I'm not comfortable to fix all the occurrences now and suggest to 
>> return to this
>> issue after gaining experience with more failure cases that are still 
>> expected.
>> The failure with the classSignature() involved was observed only once 
>> in the nightly
>> and should be extremely rare reproducible.
>> I'll file a placeholder bug if necessary.
>>
>> Thanks,
>> Serguei
>>
>> On 10/28/14 6:11 PM, serguei.spitsyn at oracle.com wrote:
>>> Please, review the fix for:
>>> https://bugs.openjdk.java.net/browse/JDK-6988950
>>>
>>>
>>> Open webrev:
>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.1/ 
>>>
>>>
>>>
>>> Summary:
>>>
>>>    The failing scenario:
>>>      The debugger and the debuggee are well aware a VM shutdown has 
>>> been started in the target process.
>>>      The debugger at this point is not expected to send any commands 
>>> to the JDWP agent.
>>>      However, the JDI layer (debugger side) and the jdwp agent 
>>> (debuggee side)
>>>      are not in sync with the consumer layers.
>>>
>>>      One reason is because the test debugger does not invoke the JDI 
>>> method VirtualMachine.dispose().
>>>      Another reason is that the Debugger and the debuggee processes 
>>> are uneasy to sync in general.
>>>
>>>      As a result the following steps are possible:
>>>        - The test debugger sends a 'quit' command to the test debuggee
>>>        - The debuggee is normally exiting
>>>        - The jdwp backend reports (over the jdwp protocol) an 
>>> anonymous class unload event
>>>        - The JDI InternalEventHandler thread handles the 
>>> ClassUnloadEvent event
>>>        - The InternalEventHandler wants to uncache the matching 
>>> reference type.
>>>          If there is more than one class with the same host class 
>>> signature, it can't distinguish them,
>>>          and so, deletes all references and re-retrieves them again 
>>> (see tracing below):
>>>            MY_TRACE: JDI: 
>>> VirtualMachineImpl.retrieveClassesBySignature: 
>>> sig=Ljava/lang/invoke/LambdaForm$DMH;
>>>        - The jdwp backend debugLoop_run() gets the command from JDI 
>>> and calls the functions
>>>          classesForSignature() and classStatus() recursively.
>>>        - The classStatus() makes a call to the JVMTI 
>>> GetClassStatus() and gets the JVMTI_ERROR_WRONG_PHASE
>>>        - As a result the jdwp backend reports the JVMTI error to the 
>>> JDI, and so, the test fails
>>>
>>>      For details, see the analysis in bug report closed as a dup of 
>>> the bug 6988950:
>>> https://bugs.openjdk.java.net/browse/JDK-8024865
>>>
>>>      Some similar cases can be found in the two bug reports (6988950 
>>> and 8024865) describing this issue.
>>>
>>>      The fix is to skip reporting the JVMTI_ERROR_WRONG_PHASE error 
>>> as it is normal at the VM shutdown.
>>>      The original jdwp backend implementation had a similar approach 
>>> for the raw monitor functions.
>>>      Threy use the ignore_vm_death() to workaround the 
>>> JVMTI_ERROR_WRONG_PHASE errors.
>>>      For reference, please, see the file: src/share/back/util.c
>>>
>>>
>>> Testing:
>>>   Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi tests
>>>
>>>
>>> Thanks,
>>> Serguei
>>>
>>
>