Codereview request: 8007710 runtime/7158988/FieldMonitor.java fails with com.sun.jdi.VMDisconnectedException: Connection closed

Fri Feb 14 00:57:10 PST 2014

Looks good!

Thanks,

-JB-

On 13.2.2014 21:15, shanliang wrote:
> Hi,
>
> Here is Version 4:
>     http://cr.openjdk.java.net/~sjiang/JDK-8007710/04/
>
> 1) remove the line
>     108   vm.resume()
> 2) call addClassWatch(vm) only when receiving VMStartEvent
> 3) make sure that the test receives ModificationWatchpointEvent
> 4) clean
>
> Thanks,
> Shanliang
>
> shanliang wrote:
>> Staffan,
>>
>> Very nice analysis!
>>
>> The fix must be very simple, just remove the line
>>     108   vm.resume
>> it is an error because here the test does not yet treat the events in
>> eventSet.
>>
>> the line
>>     136   eventSet.resume();
>> is the right place to resume the threads after event treatment.
>>
>> Here is the new webrev:
>> http://cr.openjdk.java.net/~sjiang/JDK-8007710/03/
>>
>> Thanks,
>> Shanliang
>>
>> Staffan Larsen wrote:
>>> I think I understand what happens now.
>>>
>>> The test code, simplified, looks like this (with the Thread.sleep()
>>> added that causes the test to fail):
>>>
>>>   launchTarget();
>>>   addClassWatch();
>>>   vm.resume();
>>>   Thread.sleep(1000);
>>>   while(connected) {
>>>       eventSet = eventQueue.remove()
>>>       for(event : eventQueue) {
>>>           if (event instanceof ClassPrepareEvent) {
>>>               addFieldWatch();
>>>           }
>>>       }
>>>       eventSet.resume();
>>>   }
>>>
>>> By default all events that happen will cause the debuggee to suspend
>>> (see EventRequest.setSuspendPolicy()). Thus when we get to
>>> addFieldWatch(), the vm should be suspended and we should be able to
>>> create the field watch without problem. But the VM isn’t suspended
>>> and that is why the test fail.
>>> Why isn’t the VM suspended? When we get to the “for(event :
>>> eventQueue)” the first time there are *two* events already in the
>>> queue: the VMStartEvent and a ClassPrepareEvent. At this point the VM
>>> is suspended and everything is good. We look at the first eventSet
>>> which only contains the VMStartEvent, we ignore the event, but we
>>> resume the VM. We then loop and look at the ClassPrepareEvent, but by
>>> now the VM is already running and has also terminated. Failure.
>>>
>>> Thus, we need to handle the VMStartEvent. I suggest a modification to
>>> my previous code:
>>>
>>>   launchTarget();
>>>   while(connected) {
>>>       eventSet = eventQueue.remove()
>>>       for(event : eventQueue) {
>>>           if (event instanceof VMStartEvent) {
>>>               addClassWatch();
>>>           }
>>>           if (event instanceof ClassPrepareEvent) {
>>>               addFieldWatch();
>>>           }
>>>       }
>>>       eventSet.resume();
>>>   }
>>>
>>> This will cause us to have complete control over the state of the
>>> debuggee. The first event we see will be the VMStartEvent. The VM
>>> will be suspended. We can add a class watch here. Then we resume the
>>> VM. The second event we see will be the ClassPrepareEvent with the VM
>>> suspended. We can add the field watch. Then we resume the VM and wait
>>> for the field watch events.
>>>
>>> Thanks,
>>> /Staffan
>>>
>>> On 13 feb 2014, at 11:36, shanliang <shanliang.jiang at oracle.com
>>> <mailto:shanliang.jiang at oracle.com>> wrote:
>>>
>>>> Staffan Larsen wrote:
>>>>> On 13 feb 2014, at 10:17, Jaroslav Bachorik
>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>
>>>>>> Hi Staffan,
>>>>>>
>>>>>> On 12.2.2014 18:27, Staffan Larsen wrote:
>>>>>>> I’m still not happy with this fix since I think the extra output
>>>>>>> stream synchronization logic is not needed - the debuggee should
>>>>>>> be suspended at all the interesting points. The fix I proposed is
>>>>>>> cleaner and (as far as I can tell) also fixes the problem. The
>>>>>>> only thing is that I can’t quite explain what goes wrong without
>>>>>>> the fix… I’d really like to understand that. I’ll try to dig
>>>>>>> deeper and see if I can understand exactly what happens.
>>>>>> Yes, bringing the VM to a stable state before calling other JDI
>>>>>> functions helps to stabilize the test even without the additional
>>>>>> synchronization via stdout/stdin.
>>>>>>
>>>>>> I just wonder whether this check should not be done inside
>>>>>> com.sun.jdi.connect.LaunchingConnector#launch() implementation.
>>>>>> Does it even make sense to hand off an unstable VM?
>>>>> Good question, but hard to change now - all implementations depend
>>>>> on the current functionality. The VMStartEvent also gives you a
>>>>> reference to the main thread.
>>>> The test failed when it received ClassPrepareEvent and did
>>>> addFieldWatch, that meant the test must receive already
>>>> VMStartEvent, because VMStartEvent must be the first event, if it
>>>> was true then the vm must be already stable when failing.
>>>>
>>>> Except that the test received ClassPrepareEvent before VMStartEvent
>>>> then it was doing addFieldWatch with a possibly unstable VM. in this
>>>> case we might have a serious bug in VirtualMachine implementation,
>>>> and if this is true the fix proposed to check "start" may make miss
>>>> ClassPrepareEvent, then the test would test nothing.
>>>>
>>>> Shanliang
>>>>> /S
>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>> /Staffan
>>>>>>>
>>>>>>> On 12 feb 2014, at 18:04, shanliang <shanliang.jiang at oracle.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Staffan Larsen wrote:
>>>>>>>>> I think what you need to do is wait for the VMStartEvent before
>>>>>>>>> you add requests to the VM. Note this paragraph from the
>>>>>>>>> VirtualMachine doc:
>>>>>>>>>
>>>>>>>>>  Note that a target VM launched by a launching connector is not
>>>>>>>>>  guaranteed to be stable until after the VMStartEvent has been
>>>>>>>>>  received.
>>>>>>>>>
>>>>>>>> I may miss something here, I believe VMStartEvent must be the
>>>>>>>> first event, when the test got ClassPrepareEvent, it must
>>>>>>>> already received VMStartEvent.
>>>>>>>>> I think adding code that looks something like this will make
>>>>>>>>> the test stable:
>>>>>>>>>
>>>>>>>>>     VirtualMachine vm = launchTarget(CLASS_NAME);
>>>>>>>>>     EventQueue eventQueue = vm.eventQueue();
>>>>>>>>>
>>>>>>>>>     boolean started = false;
>>>>>>>>>     while(!started) {
>>>>>>>>>       EventSet eventSet = eventQueue.remove();
>>>>>>>>>       for (Event event : eventSet) {
>>>>>>>>>         if (event instanceof VMStartEvent) {
>>>>>>>>>           started = true;
>>>>>>>>>         }
>>>>>>>>>         if (event instanceof VMDeathEvent
>>>>>>>>>             || event instanceof VMDisconnectEvent) {
>>>>>>>>>           throw new Error("VM died before it started...:"+event);
>>>>>>>>>         }
>>>>>>>>>       }
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>     System.out.println("Vm launched");
>>>>>>>>>
>>>>>>>> The code you proposed could improve the test, it made sure that
>>>>>>>> TestPostFieldModification was started, but I am afraid that it
>>>>>>>> did not address the issue causing the failure, the issue I
>>>>>>>> believe was that TestPostFieldModification exited before or
>>>>>>>> during FieldMonitor called addFieldWatch(), that was why
>>>>>>>> addFieldWatch() received VMDisconnectedException. When the test
>>>>>>>> was treating ClassPrepareEvent, even if VMDeathEvent or
>>>>>>>> VMDisconnectEvent arrived, it must be still waiting in the
>>>>>>>> eventQueue because it arrived after ClassPrepareEvent.
>>>>>>>>
>>>>>>>> My fix was to not allow TestPostFieldModification to exit before
>>>>>>>> addFieldWatch() was done.
>>>>>>>>> There is also no reason to call addFieldWatch() before the
>>>>>>>>> ClassPrepareEvent has been received. The call to
>>>>>>>>> vm..classesByName() will just return an empty list anyway.
>>>>>>>>>
>>>>>>>> I do not know why the test called addFieldWatch before
>>>>>>>> ClassPrepareEvent had been received, but yes the returned list
>>>>>>>> was empty, so agree to remove it.
>>>>>>>>> While you are in there you can also remove the unused
>>>>>>>>> StringBuffer near the top of main().
>>>>>>>>>
>>>>>>>> Yes it was already removed in version 01
>>>>>>>>
>>>>>>>> Here is the new webrev:
>>>>>>>> http://cr.openjdk.java.net/~sjiang/JDK-8007710/02/
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Shanliang
>>>>>>>>> Thanks,
>>>>>>>>> /Staffan
>>>>>>>>>
>>>>>>>>> On 11 feb 2014, at 18:30, shanliang
>>>>>>>>> <shanliang.jiang at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Here is the new fix in which FieldMonitor will write to
>>>>>>>>>> TestPostFieldModification, to inform the latter to quit, as
>>>>>>>>>> suggested bu Jaroslav
>>>>>>>>>>   http://cr.openjdk.java.net/~sjiang/JDK-8007710/01/
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Shanliang
>>>>>>>>>>
>>>>>>>>>> shanliang wrote:
>>>>>>>>>>
>>>>>>>>>>> shanliang wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 11.2.2014 16:31, shanliang wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Staffan Larsen wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Shanliang,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I can’t quite see how the test can fail in this way. When
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> ClassPrepareEvent happens, the debuggee will be
>>>>>>>>>>>>>>> suspended. So when
>>>>>>>>>>>>>>> addFieldWatch() is called, the debuggee should not have
>>>>>>>>>>>>>>> moved.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am not expert of jdi so I may miss something here. I
>>>>>>>>>>>>>> checked the
>>>>>>>>>>>>>> failure trace and saw the report exception happen when
>>>>>>>>>>>>>> FieldMonitor
>>>>>>>>>>>>>> received ClassPrepareEvent and was doing addFieldWatch.
>>>>>>>>>>>>>> FieldMonitor did
>>>>>>>>>>>>>> call "vm.resume()" before treating events.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> AFAICS, calling vm.resume() results in an almost immediate
>>>>>>>>>>>>> debuggee death. The gc() invoking thread "d" is flagged as
>>>>>>>>>>>>> a deamon and as such doesn't prevent the process from
>>>>>>>>>>>>> exiting. The other thread is not a daemon but will finish
>>>>>>>>>>>>> in only few cycles.
>>>>>>>>>>>>>
>>>>>>>>>>>> I looked at the class com.sun.jdi.VirtualMachine, here is
>>>>>>>>>>>> the Javadoc of the method "resume":
>>>>>>>>>>>>   /**
>>>>>>>>>>>>    * Continues the execution of the application running in this
>>>>>>>>>>>>    * virtual machine. All threads are resumed as documented in
>>>>>>>>>>>>    * {@link ThreadReference#resume}.
>>>>>>>>>>>>    *
>>>>>>>>>>>>    * @throws VMCannotBeModifiedException if the
>>>>>>>>>>>> VirtualMachine is read-only - see {@link
>>>>>>>>>>>> VirtualMachine#canBeModified()}.
>>>>>>>>>>>>    *
>>>>>>>>>>>>    * @see #suspend
>>>>>>>>>>>>    */
>>>>>>>>>>>>   void resume();
>>>>>>>>>>>> My understanding is that the debuggee resumes to work after
>>>>>>>>>>>> this call, instead to die?
>>>>>>>>>>>>
>>>>>>>>>>> In fact the problem is here, the vm
>>>>>>>>>>> (TestPostFieldModification) should not die before
>>>>>>>>>>> FieldMonitor finishes addFieldWatch.
>>>>>>>>>>>
>>>>>>>>>>> Shanliang
>>>>>>>>>>>
>>>>>>>>>>>>>> I reproduced the bug by add sleep(1000) after vm.resume()
>>>>>>>>>>>>>> but before
>>>>>>>>>>>>>> calling eventQueue.remove();
>>>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like some kind of synchronization between the
>>>>>>>>>>>>> debugger and the debuggee is necessary. But I wonder if you
>>>>>>>>>>>>> should better use the process.getOuptuptStream() to write
>>>>>>>>>>>>> and flush a message for the debugee indicating that it can
>>>>>>>>>>>>> exit. And in the debugee you would just do System.in.read()
>>>>>>>>>>>>> as the last statement in the main() method. Seems more
>>>>>>>>>>>>> robust than involving files.
>>>>>>>>>>>>>
>>>>>>>>>>>> It could work, but creating a file in the testing directory
>>>>>>>>>>>> should have no issue, but yes maybe less performance.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One problem I do see with the test is that it does not
>>>>>>>>>>>>>>> wait for a
>>>>>>>>>>>>>>> VMStartEvent before setting up requests. I’m not sure if
>>>>>>>>>>>>>>> that could
>>>>>>>>>>>>>>> cause the failure in the bug report, though.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /Staffan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 11 feb 2014, at 15:13, shanliang
>>>>>>>>>>>>>>> <shanliang.jiang at oracle.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi ,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The problem could be that FieldMonitor did not have
>>>>>>>>>>>>>>>> enough time to
>>>>>>>>>>>>>>>> "addFieldWatch" but the vm to monitor
>>>>>>>>>>>>>>>> (TestPostFieldModification) was
>>>>>>>>>>>>>>>> already ended.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So we should make sure that TestPostFieldModification
>>>>>>>>>>>>>>>> exits after
>>>>>>>>>>>>>>>> FieldMonitor has done necessary. The solution proposed
>>>>>>>>>>>>>>>> here is that
>>>>>>>>>>>>>>>> FieldMonitor creates a file after adding field watching,
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> TestPostFieldModification quits only after finding the
>>>>>>>>>>>>>>>> file.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> web:
>>>>>>>>>>>>>>>> http://icncweb.fr.oracle.com/~shjiang/webrev/8007710/00/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> bug:
>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8007710
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>>>>>
>>>>
>>>>
>>>
>>
>