jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure

Jaroslav Bachorik jaroslav.bachorik at oracle.com
Wed Jan 9 01:45:51 PST 2013


On 01/09/2013 09:40 AM, shanliang wrote:
> I still have no idea why the test failed, but I do not see why a longer
> timeout can fix the test. Have you reproduced the problem and tested
> your fix? if yes then possible the long timeout hided a real problem.

Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp
switch) and verify that the fix makes the test pass.

The ClientNotifForwarder scans the notifications for
MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
appropriate notification listeners in a separate thread. Thus, calling
"removeNotificationListener" on the main thread is prone to racing.

> 
> The timeout you made longer was used to wait a notification which should
> never arrive.

Well, it can be used to allow more time to process the "unregister"
notification, too.

When I think more of this I am more inclined to fix the race condition.
An updated webrev will follow.

> 
> To remove a listener from a client side, we did:
> 1) at client side, check whether it was added in the client side
> 2) at server side, check whether the MBean in question was registered in
> the MBeanServer (!!!)
> 3) at server side, check whether the listener was added.
> 
> So 2) tells that we did not rely on a "unregister" notification. Anyway,
> if you use a SAME thread to call "unregister" operation to unregister an
> mbean, then any following call (without any time break) to use the mbean
> should fail, like "removeNotificationListener", "isRegistered" etc.
> 
> I do see a bug here, if we remove a listener from a non-existing MBeam,
> we get "ListenerNotFoundException" at a client side, but get
> "InstanceNotFoundException" at server side, I think we should create a
> bug, because both implemented the same interface MBeanServerConnection.

Yes, it is rather inconsistent.

-JB-

> 
> Shanliang
> 
> Jaroslav Bachorik wrote:
>> Looking for review and a sponsor.
>>
>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>
>> In this issue the timing is the problem. MBeanServer.unregisterMBean()
>> fires the "unregister" notification which is sent to the server
>> asynchronously. Thus it may happen that the "unregister" notification
>> has not been yet processed at the time of invoking
>> removeNotificationListener() and the notification listeners hasn't been
>> cleaned up leading to the test failure.
>>
>> There is no synchronization between the client and the server and such
>> race condition can occur occasionally. Normally, the execution is fast
>> enough to behave like the "unregister" notification is processed
>> synchronously with the unregisterMBean() operation but it seems that
>> using fastdebug Server VM bits with the -Xcomp option strains the CPU
>> enough to make this problem appear.
>>
>> There is no proper fix for this - the only thing that work is waiting a
>> bit longer in the main thread to give the notification processing thread
>> some time to clean up the listeners.
>>
>> Regards,
>>
>> -JB-
>>   
> 



More information about the serviceability-dev mailing list