3-nd round RFR (XS) 6988950: JDWP exit error JVMTI_ERROR_WRONG_PHASE(112)

Mon Nov 3 06:16:03 UTC 2014

On 11/1/14 3:13 AM, Dmitry Samersoff wrote:
> Serguei,
>
> Thank you for good finding. This approach looks much better for me.
>
> The fix looks good.
>
> Is it necessary to release vmDeathLock locks at
> eventHandler.c:1244 before call
>
> EXIT_ERROR(error,"Can't clear event callbacks on vm death"); ?
Nice catch.
Yes, it is better to release the lock in that case.

Thanks, Dmitry!
Serguei

>
> -Dmitry
>
>
>
> On 2014-11-01 00:07, serguei.spitsyn at oracle.com wrote:
>> It is 3-rd round of review for:
>>    https://bugs.openjdk.java.net/browse/JDK-6988950
>>
>> New webrev:
>>   
>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.3/
>>
>>
>> Summary
>>
>>    For failing scenario, please, refer to the 1-st round RFR below.
>>
>>    I've found what is missed in the jdwp agent shutdown and decided to
>> switch from a workaround to a real fix.
>>
>>    The agent VM_DEATH callback sets the gdata field: gdata->vmDead = 1.
>>    The agent debugLoop_run() has a guard against the VM shutdown:
>>
>>   165             } else if (gdata->vmDead &&
>>   166              ((cmd->cmdSet) != JDWP_COMMAND_SET(VirtualMachine))) {
>>   167                 /* Protect the VM from calls while dead.
>>   168                  * VirtualMachine cmdSet quietly ignores some cmds
>>   169                  * after VM death, so, it sends it's own errors.
>>   170                  */
>>   171                 outStream_setError(&out, JDWP_ERROR(VM_DEAD));
>>
>>
>>    However, the guard above does not help much if the VM_DEATH event
>> happens in the middle of a command execution.
>>    There is a lack of synchronization here.
>>
>>    The fix introduces new lock (vmDeathLock) which does not allow to
>> execute the commands
>>    and the VM_DEATH event callback concurrently.
>>    It should work well for any function that is used in implementation of
>> the JDWP_COMMAND_SET(VirtualMachine) .
>>
>>
>> Testing:
>>    Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi tests
>>
>>
>> Thanks,
>> Serguei
>>
>>
>> On 10/29/14 6:05 PM, serguei.spitsyn at oracle.com wrote:
>>> The updated webrev:
>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.2/
>>>
>>>
>>> The changes are:
>>>    - added a comment recommended by Staffan
>>>    - removed the ignore_wrong_phase() call from function classSignature()
>>>
>>> The classSignature() function is called in 16 places.
>>> Most of them do not tolerate the NULL in place of returned signature
>>> and will crash.
>>> I'm not comfortable to fix all the occurrences now and suggest to
>>> return to this
>>> issue after gaining experience with more failure cases that are still
>>> expected.
>>> The failure with the classSignature() involved was observed only once
>>> in the nightly
>>> and should be extremely rare reproducible.
>>> I'll file a placeholder bug if necessary.
>>>
>>> Thanks,
>>> Serguei
>>>
>>> On 10/28/14 6:11 PM, serguei.spitsyn at oracle.com wrote:
>>>> Please, review the fix for:
>>>>    https://bugs.openjdk.java.net/browse/JDK-6988950
>>>>
>>>>
>>>> Open webrev:
>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.1/
>>>>
>>>>
>>>>
>>>> Summary:
>>>>
>>>>     The failing scenario:
>>>>       The debugger and the debuggee are well aware a VM shutdown has
>>>> been started in the target process.
>>>>       The debugger at this point is not expected to send any commands
>>>> to the JDWP agent.
>>>>       However, the JDI layer (debugger side) and the jdwp agent
>>>> (debuggee side)
>>>>       are not in sync with the consumer layers.
>>>>
>>>>       One reason is because the test debugger does not invoke the JDI
>>>> method VirtualMachine.dispose().
>>>>       Another reason is that the Debugger and the debuggee processes
>>>> are uneasy to sync in general.
>>>>
>>>>       As a result the following steps are possible:
>>>>         - The test debugger sends a 'quit' command to the test debuggee
>>>>         - The debuggee is normally exiting
>>>>         - The jdwp backend reports (over the jdwp protocol) an
>>>> anonymous class unload event
>>>>         - The JDI InternalEventHandler thread handles the
>>>> ClassUnloadEvent event
>>>>         - The InternalEventHandler wants to uncache the matching
>>>> reference type.
>>>>           If there is more than one class with the same host class
>>>> signature, it can't distinguish them,
>>>>           and so, deletes all references and re-retrieves them again
>>>> (see tracing below):
>>>>             MY_TRACE: JDI:
>>>> VirtualMachineImpl.retrieveClassesBySignature:
>>>> sig=Ljava/lang/invoke/LambdaForm$DMH;
>>>>         - The jdwp backend debugLoop_run() gets the command from JDI
>>>> and calls the functions
>>>>           classesForSignature() and classStatus() recursively.
>>>>         - The classStatus() makes a call to the JVMTI GetClassStatus()
>>>> and gets the JVMTI_ERROR_WRONG_PHASE
>>>>         - As a result the jdwp backend reports the JVMTI error to the
>>>> JDI, and so, the test fails
>>>>
>>>>       For details, see the analysis in bug report closed as a dup of
>>>> the bug 6988950:
>>>>          https://bugs.openjdk.java.net/browse/JDK-8024865
>>>>
>>>>       Some similar cases can be found in the two bug reports (6988950
>>>> and 8024865) describing this issue.
>>>>
>>>>       The fix is to skip reporting the JVMTI_ERROR_WRONG_PHASE error
>>>> as it is normal at the VM shutdown.
>>>>       The original jdwp backend implementation had a similar approach
>>>> for the raw monitor functions.
>>>>       Threy use the ignore_vm_death() to workaround the
>>>> JVMTI_ERROR_WRONG_PHASE errors.
>>>>       For reference, please, see the file: src/share/back/util.c
>>>>
>>>>
>>>> Testing:
>>>>    Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi tests
>>>>
>>>>
>>>> Thanks,
>>>> Serguei
>>>>
>