RFR(S): 8134103: JVMTI_ERROR_WRONG_PHASE(112): on checking for an interface

Thu Mar 2 22:55:29 UTC 2017

On 3/2/17 3:19 PM, serguei.spitsyn at oracle.com wrote:
> On 3/2/17 14:06, Daniel D. Daugherty wrote:
>> On 3/2/17 2:24 PM, serguei.spitsyn at oracle.com wrote:
>>> The updated webrev:
>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2017/hotspot/8134103-jdi-wrong-phase.2/ 
>>>
>>>
>>> The change at L152-L153 has been reverted.
>>> Just one sanity check is needed.
>>
>> This comment from earlier review is still unresolved:
>>
>> src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c
>>     L243:          * Immediately close out any commands enqueued from a
>>     L244:          * previously attached debugger.
>>
>>         Perhaps L244 can be change to:
>>
>>                    * dead VM or a previously attached debugger.
>>
>>         to match the changed line of code:
>>
>>             L246         if (gdata->vmDead || command->sessionID != 
>> currentSessionID) {
>>
>> Sorry I didn't notice that you didn't reply to it earlier...
>
> Sorry, I missed it - fixed now.
> I guess, it can be pushed now.
>
>
>>
>> I presume you are retesting... How often does
>> hotspot/test/closed/compiler/c1/6507107/HeapwalkingTest.java reproduce
>> the problem?
>
> The repro-rate is pretty high it is about 1/3 - 1/2.
> It is non-reproducible with the fix.

So just to be clear, with just these code changes in place:

- grab the vmDeathLock for all JDWP command sets instead of just
   the VirtualMachine command set in the debugLoop thread
- ignore/close out cmds when gdata->vmDead is true in addition
   to the existing old session check in the CommandLoop thread

the HeapwalkingTest goes from failing 1/3 -> 1/2 the time to
not failing in 200 runs so far...

I'd say you nailed this one nicely!

Dan

> The test is pretty slow.
> At this point I've got about 200 clean runs.
> All the jdi/jdwp tests are clean too.
>
> Thanks!
> Serguei
>
>>
>> Dan
>>
>>>
>>> Thanks,
>>> Serguei
>>>
>>>
>>> On 3/2/17 11:21, serguei.spitsyn at oracle.com wrote:
>>>> Dan,
>>>>
>>>> Thank you for reviewing!
>>>> I was waiting for your comments.
>>>>
>>>>
>>>> On 3/2/17 06:59, Daniel D. Daugherty wrote:
>>>>> On 3/1/17 8:49 PM, serguei.spitsyn at oracle.com wrote:
>>>>>> Please, review the JDK 10 fix for:
>>>>>>   https://bugs.openjdk.java.net/browse/JDK-8134103
>>>>>>
>>>>>>
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2017/hotspot/8134103-jdi-wrong-phase.1/ 
>>>>>>
>>>>>
>>>>> src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c
>>>>>     old L152:             } else if (gdata->vmDead &&
>>>>>     old L153:              ((cmd->cmdSet) != 
>>>>> JDWP_COMMAND_SET(VirtualMachine))) {
>>>>>
>>>>>         The old code used to set the error condition when the VM
>>>>>         is dead and the command was not in the VirtualMachine
>>>>>         command set. With the new code:
>>>>>
>>>>>             L150:             } else if (gdata->vmDead) {
>>>>>
>>>>>         The error condition is now set for all command sets
>>>>>         including the VirtualMachine command set. Minimally
>>>>>         that means that this comment needs work:
>>>>>
>>>>>             L152:                  * VirtualMachine cmdSet quietly 
>>>>> ignores some cmds
>>>>>             L153:                  * after VM death, so, it sends 
>>>>> it's own errors.
>>>>>
>>>>>         since you are no longer letting the VirtualMachine cmdSet 
>>>>> send
>>>>>         its own errors.
>>>>
>>>> Agreed, good catch, thanks!
>>>>
>>>>
>>>>>
>>>>>         It's not clear to me why you are now setting the error
>>>>>         condition for VirtualMachine cmds instead of letting those
>>>>>         cmds send their own errors.
>>>>>
>>>>>         See more comments below for your summary.
>>>>
>>>> Most likely, you are right here.
>>>> I specifically looked at the VM commands but overlooked the ones 
>>>> that are silently ignored.
>>>> For instance: Resume (9) or exit (10).
>>>>
>>>> So, I will need to restore the L152+L153.
>>>> Please, let me re-test this update and then I'll send another webrev.
>>>> Interesting that no test has caught this as there is a very tiny 
>>>> gap for such an intermittent issue to appear.
>>>>
>>>>
>>>>>
>>>>> src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c
>>>>>     L243:          * Immediately close out any commands enqueued 
>>>>> from a
>>>>>     L244:          * previously attached debugger.
>>>>>
>>>>>         Perhaps L244 can be change to:
>>>>>
>>>>>                    * dead VM or a previously attached debugger.
>>>>>
>>>>>         to match the changed line of code:
>>>>>
>>>>>             L246         if (gdata->vmDead || command->sessionID 
>>>>> != currentSessionID) {
>>>>>
>>>>>>
>>>>>>
>>>>>> Summary:
>>>>>>   This is an intermittent issue in the debugger back-end (JDWP 
>>>>>> agent)
>>>>>>   that impacts the nightly testing stability.
>>>>>
>>>>> Congrats on tracking down this elusive bug!
>>>>
>>>> Thanks!
>>>>
>>>>>
>>>>>
>>>>>>   The fix adds check guards of gdata->vmDead condition to the:
>>>>>>    - debugLoop (JDWP Event Helper thread) and
>>>>>>    - commandLoop (JDWP Transport Listener thread)
>>>>>>   The commands are ignored in the DEAD phase.
>>>>>
>>>>> In the debugLoop, you aren't ignoring the commands, you
>>>>> are setting an error condition.
>>>>
>>>> Right, I meant: ignored == not executed. :)
>>>>
>>>>
>>>>>
>>>>>
>>>>>>   The check guard in the debugLoop already existed but only for 
>>>>>> VirtualMachine
>>>>>>   command set, so it has been extended to commands from all JDWP 
>>>>>> command sets.
>>>>>
>>>>> I'm reading the existing code exactly opposite of what you
>>>>> say here. The gdata->vmDead check applied to all command
>>>>> sets except for the VirtualMachine command set. I agree
>>>>> that you've extended it to all command sets, but we're
>>>>> not ignoring the commands. We are now setting the error
>>>>> condition for all command sets.
>>>>
>>>> Agreed.
>>>>
>>>>
>>>>>
>>>>> Maybe I'm missing something here. Perhaps I've been away
>>>>> from this code for too long... :-)
>>>>
>>>> No, it is that I'm still learning this code with your help. :-)
>>>>
>>>>
>>>> Thanks a lot!
>>>> Serguei
>>>>
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>>   I suspect, this bug could also cause some of the timeout and 
>>>>>> socket related issues.
>>>>>>
>>>>>>
>>>>>> Testing:
>>>>>>   The fix was tested with the nsk.jdi, jtreg com/sun/jdi and 100 
>>>>>> runs
>>>>>>   of the 
>>>>>> hotspot/test/closed/compiler/c1/6507107/HeapwalkingTest.java.
>>>>>>   The test results are very clean.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Serguei
>>>>>
>>>>
>>>
>>
>