3-nd round RFR (XS) 6988950: JDWP exit error JVMTI_ERROR_WRONG_PHASE(112)

Mon Nov 3 19:35:38 UTC 2014

On 11/3/14 10:41 AM, Daniel D. Daugherty wrote:
> On 10/31/14 3:07 PM, serguei.spitsyn at oracle.com wrote:
>>
>> It is 3-rd round of review for:
>> https://bugs.openjdk.java.net/browse/JDK-6988950
>>
>> New webrev:
>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.3/
>
> Thumbs up on the code. 

Thanks, Dan!

> I have comment suggestions below...
>
> src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c
>     line 149: debugMonitorEnter(vmDeathLock);
>         Perhaps this comment would help:
>         /*
>          * We grab the vmDeathLock here to prevent the cbVMDeath()
>          * event handler from tearing things down while we're
>          * asynchronously processing a command.
>          */

Done

>
> src/jdk.jdwp.agent/share/native/libjdwp/eventHandler.c
>     line 71: jrawMonitorID vmDeathLock;
>         A nice blurb about the new global lock's protocol would be
>         good here. Something like:
>
>         /*
>          * Coordinates the cbVMDeath() event handler and the
>          * debugLoop_run() thread.
>          */
>

Done

>     line 1236: debugMonitorEnter(vmDeathLock);
>         Perhaps this comment would help:
>         /*
>          * We grab the vmDeathLock here to prevent the debugLoop_run()
>          * thread from asynchronously dispatching another command.
>          */
>

Done
I also think, it'd be enough to narrow the scope of synchronization
around the event_callback() call:

         /*
          * Coordinates the cbVMDeath() event handler and the
          * debugLoop_run() thread.
          */
         debugMonitorEnter(vmDeathLock);

         /* Only now should we actually process the VM death event */
         (void)memset(&info,0,sizeof(info));
         info.ei                 = EI_VM_DEATH;
         event_callback(env, &info);

         debugMonitorExit(vmDeathLock);

>     This block caught my eye:
>
>     1295     /*
>     1296      * The VM will die soon after the completion of this 
> callback - we
>     1297      * may need to do a final synchronization with the 
> command loop to
>     1298      * avoid the VM terminating with replying to the final 
> (resume)
>     1299      * command.
>     1300      */
>     1301     debugLoop_sync();
>
>     The above comment implies that debugLoop_sync() does something to
>     coordinate this code (cbVMDeath()) with the debugLoop code, but
>     clearly something is incomplete. It's entirely possible that this
>     debugLoop_sync() is solving a different problem that happens after
>     the debugLoop has left its command processing loop and realizes
>     that the VM is shutting down.

This block caught my eye too.
I agree, it looks incomplete.
The cbVMDeath() callback waits until a resume command finishes if it has 
been started.
Not sure how useful it is.

Thanks!
Serguei

>
>     Don't know for sure. I haven't been in this code for quite a while...
>
> Dan
>
>
>>
>>
>> Summary
>>
>>   For failing scenario, please, refer to the 1-st round RFR below.
>>
>>   I've found what is missed in the jdwp agent shutdown and decided to 
>> switch from a workaround to a real fix.
>>
>>   The agent VM_DEATH callback sets the gdata field: gdata->vmDead = 1.
>>   The agent debugLoop_run() has a guard against the VM shutdown:
>>   165             } else if (gdata->vmDead &&
>>   166              ((cmd->cmdSet) != JDWP_COMMAND_SET(VirtualMachine))) {
>>   167                 /* Protect the VM from calls while dead.
>>   168                  * VirtualMachine cmdSet quietly ignores some cmds
>>   169                  * after VM death, so, it sends it's own errors.
>>   170                  */
>>   171                 outStream_setError(&out, JDWP_ERROR(VM_DEAD));
>>
>>   However, the guard above does not help much if the VM_DEATH event 
>> happens in the middle of a command execution.
>>   There is a lack of synchronization here.
>>
>>   The fix introduces new lock (vmDeathLock) which does not allow to 
>> execute the commands
>>   and the VM_DEATH event callback concurrently.
>>   It should work well for any function that is used in implementation 
>> of the JDWP_COMMAND_SET(VirtualMachine) .
>>
>>
>> Testing:
>>   Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi tests
>>
>>
>> Thanks,
>> Serguei
>>
>>
>> On 10/29/14 6:05 PM, serguei.spitsyn at oracle.com wrote:
>>> The updated webrev:
>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.2/ 
>>>
>>>
>>> The changes are:
>>>   - added a comment recommended by Staffan
>>>   - removed the ignore_wrong_phase() call from function 
>>> classSignature()
>>>
>>> The classSignature() function is called in 16 places.
>>> Most of them do not tolerate the NULL in place of returned signature 
>>> and will crash.
>>> I'm not comfortable to fix all the occurrences now and suggest to 
>>> return to this
>>> issue after gaining experience with more failure cases that are 
>>> still expected.
>>> The failure with the classSignature() involved was observed only 
>>> once in the nightly
>>> and should be extremely rare reproducible.
>>> I'll file a placeholder bug if necessary.
>>>
>>> Thanks,
>>> Serguei
>>>
>>> On 10/28/14 6:11 PM, serguei.spitsyn at oracle.com wrote:
>>>> Please, review the fix for:
>>>> https://bugs.openjdk.java.net/browse/JDK-6988950
>>>>
>>>>
>>>> Open webrev:
>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.1/ 
>>>>
>>>>
>>>>
>>>> Summary:
>>>>
>>>>    The failing scenario:
>>>>      The debugger and the debuggee are well aware a VM shutdown has 
>>>> been started in the target process.
>>>>      The debugger at this point is not expected to send any 
>>>> commands to the JDWP agent.
>>>>      However, the JDI layer (debugger side) and the jdwp agent 
>>>> (debuggee side)
>>>>      are not in sync with the consumer layers.
>>>>
>>>>      One reason is because the test debugger does not invoke the 
>>>> JDI method VirtualMachine.dispose().
>>>>      Another reason is that the Debugger and the debuggee processes 
>>>> are uneasy to sync in general.
>>>>
>>>>      As a result the following steps are possible:
>>>>        - The test debugger sends a 'quit' command to the test debuggee
>>>>        - The debuggee is normally exiting
>>>>        - The jdwp backend reports (over the jdwp protocol) an 
>>>> anonymous class unload event
>>>>        - The JDI InternalEventHandler thread handles the 
>>>> ClassUnloadEvent event
>>>>        - The InternalEventHandler wants to uncache the matching 
>>>> reference type.
>>>>          If there is more than one class with the same host class 
>>>> signature, it can't distinguish them,
>>>>          and so, deletes all references and re-retrieves them again 
>>>> (see tracing below):
>>>>            MY_TRACE: JDI: 
>>>> VirtualMachineImpl.retrieveClassesBySignature: 
>>>> sig=Ljava/lang/invoke/LambdaForm$DMH;
>>>>        - The jdwp backend debugLoop_run() gets the command from JDI 
>>>> and calls the functions
>>>>          classesForSignature() and classStatus() recursively.
>>>>        - The classStatus() makes a call to the JVMTI 
>>>> GetClassStatus() and gets the JVMTI_ERROR_WRONG_PHASE
>>>>        - As a result the jdwp backend reports the JVMTI error to 
>>>> the JDI, and so, the test fails
>>>>
>>>>      For details, see the analysis in bug report closed as a dup of 
>>>> the bug 6988950:
>>>> https://bugs.openjdk.java.net/browse/JDK-8024865
>>>>
>>>>      Some similar cases can be found in the two bug reports 
>>>> (6988950 and 8024865) describing this issue.
>>>>
>>>>      The fix is to skip reporting the JVMTI_ERROR_WRONG_PHASE error 
>>>> as it is normal at the VM shutdown.
>>>>      The original jdwp backend implementation had a similar 
>>>> approach for the raw monitor functions.
>>>>      Threy use the ignore_vm_death() to workaround the 
>>>> JVMTI_ERROR_WRONG_PHASE errors.
>>>>      For reference, please, see the file: src/share/back/util.c
>>>>
>>>>
>>>> Testing:
>>>>   Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi tests
>>>>
>>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>
>>
>