From kelly.ohair at oracle.com  Fri Jan  4 14:37:46 2013
From: kelly.ohair at oracle.com (Kelly O'Hair)
Date: Fri, 4 Jan 2013 14:37:46 -0800
Subject: jmx-dev [PATCH] JDK-8005472:
	com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh
	failed on windows
In-Reply-To: <50E16BA8.40203@oracle.com>
References: <50E16BA8.40203@oracle.com>
Message-ID: <682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com>


On Dec 31, 2012, at 2:40 AM, Jaroslav Bachorik wrote:

> Looking for a review and a sponsor.
> 
> Webrev at:
> http://cr.openjdk.java.net/~jbachorik/8005472/webrev.00/test/com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh.sdiff.html
> 
> JPRT run on windows targets:
> http://sthjprt.se.oracle.com/archives/2012/12/2012-12-28-123054.jbachorik.openjdk8-tl//JobStatus.txt
> 
> The issue is about a new test failing when run on windows machines. It
> seems that the cygwin really does not like removing a non-existent file
> - to the extent of hanging the script indefinitely.

I suspect it is not hanging because it does not exist, but that some other windows process has it's hands on it.
This is the stdout file from the server being started up right?
Could the server from a previous test run be still running?

Maybe a better answer might be to make the filename a bit more unique, like maybe foobar.$$  ???

> 
> The patch adds a pre-check for the existence of the file to be removed.
> It does not change the test in any other way.

This test doesn't make much sense to me. rm should never hang on a non existent file.

And by the way, it might be a good idea for scripts to always use 'rm -f', which is what the default is for Makefiles with $(RM)


-kto

> 
> 
> Thanks,
> 
> -JB-


From jaroslav.bachorik at oracle.com  Mon Jan  7 03:23:00 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 07 Jan 2013 12:23:00 +0100
Subject: jmx-dev [PATCH] JDK-8005472:
 com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh
 failed on windows
In-Reply-To: <682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com>
References: <50E16BA8.40203@oracle.com>
	<682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com>
Message-ID: <50EAB014.30805@oracle.com>

On 01/04/2013 11:37 PM, Kelly O'Hair wrote:
> 
> On Dec 31, 2012, at 2:40 AM, Jaroslav Bachorik wrote:
> 
>> Looking for a review and a sponsor.
>>
>> Webrev at:
>> http://cr.openjdk.java.net/~jbachorik/8005472/webrev.00/test/com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh.sdiff.html
>>
>> JPRT run on windows targets:
>> http://sthjprt.se.oracle.com/archives/2012/12/2012-12-28-123054.jbachorik.openjdk8-tl//JobStatus.txt
>>
>> The issue is about a new test failing when run on windows machines. It
>> seems that the cygwin really does not like removing a non-existent file
>> - to the extent of hanging the script indefinitely.
> 
> I suspect it is not hanging because it does not exist, but that some other windows process has it's hands on it.
> This is the stdout file from the server being started up right?
> Could the server from a previous test run be still running?

Exactly. Amy confirmed this and provided a patch which resolves the
hanging problem.

The update patch is at
http://cr.openjdk.java.net/~jbachorik/8005472/webrev.01

-JB-

> 
> Maybe a better answer might be to make the filename a bit more unique, like maybe foobar.$$  ???
> 
>>
>> The patch adds a pre-check for the existence of the file to be removed.
>> It does not change the test in any other way.
> 
> This test doesn't make much sense to me. rm should never hang on a non existent file.
> 
> And by the way, it might be a good idea for scripts to always use 'rm -f', which is what the default is for Makefiles with $(RM)
> 
> 
> -kto
> 
>>
>>
>> Thanks,
>>
>> -JB-
> 


From jaroslav.bachorik at oracle.com  Mon Jan  7 05:44:03 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 07 Jan 2013 14:44:03 +0100
Subject: jmx-dev [PATCH] JDK-8005791: Remove java.beans.* imports from
	com.sun.jmx.mbeanserver.Introspector
Message-ID: <50EAD123.7040202@oracle.com>

Looking for reviewers and a sponsor.

This is a simple patch to remove unused java.beans.* imports from
com.sun.jmx.mbeanserver.Introspector. The actual usage of java.beans.*
classes was removed from the Introspector only the imports are left
dangling.

The webrev is at http://cr.openjdk.java.net/~jbachorik/8005791/webrev.00/

Thanks,

-JB-

From jaroslav.bachorik at oracle.com  Tue Jan  8 07:16:32 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 08 Jan 2013 16:16:32 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
	failure
Message-ID: <50EC3850.7080508@oracle.com>

Looking for review and a sponsor.

Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00

In this issue the timing is the problem. MBeanServer.unregisterMBean()
fires the "unregister" notification which is sent to the server
asynchronously. Thus it may happen that the "unregister" notification
has not been yet processed at the time of invoking
removeNotificationListener() and the notification listeners hasn't been
cleaned up leading to the test failure.

There is no synchronization between the client and the server and such
race condition can occur occasionally. Normally, the execution is fast
enough to behave like the "unregister" notification is processed
synchronously with the unregisterMBean() operation but it seems that
using fastdebug Server VM bits with the -Xcomp option strains the CPU
enough to make this problem appear.

There is no proper fix for this - the only thing that work is waiting a
bit longer in the main thread to give the notification processing thread
some time to clean up the listeners.

Regards,

-JB-

From shanliang.jiang at oracle.com  Wed Jan  9 00:40:42 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Wed, 09 Jan 2013 09:40:42 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50EC3850.7080508@oracle.com>
References: <50EC3850.7080508@oracle.com>
Message-ID: <50ED2D0A.5000509@oracle.com>

I still have no idea why the test failed, but I do not see why a longer 
timeout can fix the test. Have you reproduced the problem and tested 
your fix? if yes then possible the long timeout hided a real problem.

The timeout you made longer was used to wait a notification which should 
never arrive.

To remove a listener from a client side, we did:
1) at client side, check whether it was added in the client side
2) at server side, check whether the MBean in question was registered in 
the MBeanServer (!!!)
3) at server side, check whether the listener was added.

So 2) tells that we did not rely on a "unregister" notification. Anyway, 
if you use a SAME thread to call "unregister" operation to unregister an 
mbean, then any following call (without any time break) to use the mbean 
should fail, like "removeNotificationListener", "isRegistered" etc.

I do see a bug here, if we remove a listener from a non-existing MBeam, 
we get "ListenerNotFoundException" at a client side, but get 
"InstanceNotFoundException" at server side, I think we should create a 
bug, because both implemented the same interface MBeanServerConnection.

Shanliang

Jaroslav Bachorik wrote:
> Looking for review and a sponsor.
>
> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>
> In this issue the timing is the problem. MBeanServer.unregisterMBean()
> fires the "unregister" notification which is sent to the server
> asynchronously. Thus it may happen that the "unregister" notification
> has not been yet processed at the time of invoking
> removeNotificationListener() and the notification listeners hasn't been
> cleaned up leading to the test failure.
>
> There is no synchronization between the client and the server and such
> race condition can occur occasionally. Normally, the execution is fast
> enough to behave like the "unregister" notification is processed
> synchronously with the unregisterMBean() operation but it seems that
> using fastdebug Server VM bits with the -Xcomp option strains the CPU
> enough to make this problem appear.
>
> There is no proper fix for this - the only thing that work is waiting a
> bit longer in the main thread to give the notification processing thread
> some time to clean up the listeners.
>
> Regards,
>
> -JB-
>   


From jaroslav.bachorik at oracle.com  Wed Jan  9 01:45:51 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 09 Jan 2013 10:45:51 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50ED2D0A.5000509@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
Message-ID: <50ED3C4F.1070001@oracle.com>

On 01/09/2013 09:40 AM, shanliang wrote:
> I still have no idea why the test failed, but I do not see why a longer
> timeout can fix the test. Have you reproduced the problem and tested
> your fix? if yes then possible the long timeout hided a real problem.

Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp
switch) and verify that the fix makes the test pass.

The ClientNotifForwarder scans the notifications for
MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
appropriate notification listeners in a separate thread. Thus, calling
"removeNotificationListener" on the main thread is prone to racing.

> 
> The timeout you made longer was used to wait a notification which should
> never arrive.

Well, it can be used to allow more time to process the "unregister"
notification, too.

When I think more of this I am more inclined to fix the race condition.
An updated webrev will follow.

> 
> To remove a listener from a client side, we did:
> 1) at client side, check whether it was added in the client side
> 2) at server side, check whether the MBean in question was registered in
> the MBeanServer (!!!)
> 3) at server side, check whether the listener was added.
> 
> So 2) tells that we did not rely on a "unregister" notification. Anyway,
> if you use a SAME thread to call "unregister" operation to unregister an
> mbean, then any following call (without any time break) to use the mbean
> should fail, like "removeNotificationListener", "isRegistered" etc.
> 
> I do see a bug here, if we remove a listener from a non-existing MBeam,
> we get "ListenerNotFoundException" at a client side, but get
> "InstanceNotFoundException" at server side, I think we should create a
> bug, because both implemented the same interface MBeanServerConnection.

Yes, it is rather inconsistent.

-JB-

> 
> Shanliang
> 
> Jaroslav Bachorik wrote:
>> Looking for review and a sponsor.
>>
>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>
>> In this issue the timing is the problem. MBeanServer.unregisterMBean()
>> fires the "unregister" notification which is sent to the server
>> asynchronously. Thus it may happen that the "unregister" notification
>> has not been yet processed at the time of invoking
>> removeNotificationListener() and the notification listeners hasn't been
>> cleaned up leading to the test failure.
>>
>> There is no synchronization between the client and the server and such
>> race condition can occur occasionally. Normally, the execution is fast
>> enough to behave like the "unregister" notification is processed
>> synchronously with the unregisterMBean() operation but it seems that
>> using fastdebug Server VM bits with the -Xcomp option strains the CPU
>> enough to make this problem appear.
>>
>> There is no proper fix for this - the only thing that work is waiting a
>> bit longer in the main thread to give the notification processing thread
>> some time to clean up the listeners.
>>
>> Regards,
>>
>> -JB-
>>   
> 


From shanliang.jiang at oracle.com  Wed Jan  9 02:08:44 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Wed, 09 Jan 2013 11:08:44 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50ED3C4F.1070001@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com>
Message-ID: <50ED41AC.4010007@oracle.com>

Jaroslav Bachorik wrote:
> On 01/09/2013 09:40 AM, shanliang wrote:
>   
>> I still have no idea why the test failed, but I do not see why a longer
>> timeout can fix the test. Have you reproduced the problem and tested
>> your fix? if yes then possible the long timeout hided a real problem.
>>     
>
> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp
> switch) and verify that the fix makes the test pass.
>
> The ClientNotifForwarder scans the notifications for
> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
> appropriate notification listeners in a separate thread. Thus, calling
> "removeNotificationListener" on the main thread is prone to racing.
>   
It is true that ClientNotifForwarder scans the notifications for 
MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the 
appropriate notification listeners in a separate thread. This is for a 
client connection to do clean if a user never calls 
removeNotificationListener.

But calling directly removeNotificationListener from a client should 
still get exception if the clean has not been done. As I said, if the 
client checked and found the listener was still there, then the client 
sent a request to its server to remove the listener at server side, the 
server should find that the MBean in question was not registered, so the 
server should throw an exception. The bug might be here.

Shanliang
>   
>> The timeout you made longer was used to wait a notification which should
>> never arrive.
>>     
>
> Well, it can be used to allow more time to process the "unregister"
> notification, too.
>
> When I think more of this I am more inclined to fix the race condition.
> An updated webrev will follow.
>
>   
>> To remove a listener from a client side, we did:
>> 1) at client side, check whether it was added in the client side
>> 2) at server side, check whether the MBean in question was registered in
>> the MBeanServer (!!!)
>> 3) at server side, check whether the listener was added.
>>
>> So 2) tells that we did not rely on a "unregister" notification. Anyway,
>> if you use a SAME thread to call "unregister" operation to unregister an
>> mbean, then any following call (without any time break) to use the mbean
>> should fail, like "removeNotificationListener", "isRegistered" etc.
>>
>> I do see a bug here, if we remove a listener from a non-existing MBeam,
>> we get "ListenerNotFoundException" at a client side, but get
>> "InstanceNotFoundException" at server side, I think we should create a
>> bug, because both implemented the same interface MBeanServerConnection.
>>     
>
> Yes, it is rather inconsistent.
>
> -JB-
>
>   
>> Shanliang
>>
>> Jaroslav Bachorik wrote:
>>     
>>> Looking for review and a sponsor.
>>>
>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>
>>> In this issue the timing is the problem. MBeanServer.unregisterMBean()
>>> fires the "unregister" notification which is sent to the server
>>> asynchronously. Thus it may happen that the "unregister" notification
>>> has not been yet processed at the time of invoking
>>> removeNotificationListener() and the notification listeners hasn't been
>>> cleaned up leading to the test failure.
>>>
>>> There is no synchronization between the client and the server and such
>>> race condition can occur occasionally. Normally, the execution is fast
>>> enough to behave like the "unregister" notification is processed
>>> synchronously with the unregisterMBean() operation but it seems that
>>> using fastdebug Server VM bits with the -Xcomp option strains the CPU
>>> enough to make this problem appear.
>>>
>>> There is no proper fix for this - the only thing that work is waiting a
>>> bit longer in the main thread to give the notification processing thread
>>> some time to clean up the listeners.
>>>
>>> Regards,
>>>
>>> -JB-
>>>   
>>>       
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130109/72910e24/attachment.html 

From jaroslav.bachorik at oracle.com  Wed Jan  9 05:15:58 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 09 Jan 2013 14:15:58 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50ED41AC.4010007@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com>
Message-ID: <50ED6D8E.6070404@oracle.com>

On 01/09/2013 11:08 AM, shanliang wrote:
> Jaroslav Bachorik wrote:
>> On 01/09/2013 09:40 AM, shanliang wrote:
>>  
>>> I still have no idea why the test failed, but I do not see why a longer
>>> timeout can fix the test. Have you reproduced the problem and tested
>>> your fix? if yes then possible the long timeout hided a real problem.
>>>     
>>
>> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp
>> switch) and verify that the fix makes the test pass.
>>
>> The ClientNotifForwarder scans the notifications for
>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>> appropriate notification listeners in a separate thread. Thus, calling
>> "removeNotificationListener" on the main thread is prone to racing.
>>   
> It is true that ClientNotifForwarder scans the notifications for
> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
> appropriate notification listeners in a separate thread. This is for a
> client connection to do clean if a user never calls
> removeNotificationListener.
> 
> But calling directly removeNotificationListener from a client should
> still get exception if the clean has not been done. As I said, if the
> client checked and found the listener was still there, then the client
> sent a request to its server to remove the listener at server side, the
> server should find that the MBean in question was not registered, so the
> server should throw an exception. The bug might be here.

This won't work. The server side listeners are removed upon receiving
the "unregistered" notification which is delivered from the
ClientNotificationForwarder and it may have not run yet (since it runs
in a separate executor thread). The result is that the attempt to remove
the notification listener on the server will succeed as well failing the
test subsequently.

-JB-

> 
> Shanliang
>>  
>>> The timeout you made longer was used to wait a notification which should
>>> never arrive.
>>>     
>>
>> Well, it can be used to allow more time to process the "unregister"
>> notification, too.
>>
>> When I think more of this I am more inclined to fix the race condition.
>> An updated webrev will follow.
>>
>>  
>>> To remove a listener from a client side, we did:
>>> 1) at client side, check whether it was added in the client side
>>> 2) at server side, check whether the MBean in question was registered in
>>> the MBeanServer (!!!)
>>> 3) at server side, check whether the listener was added.
>>>
>>> So 2) tells that we did not rely on a "unregister" notification. Anyway,
>>> if you use a SAME thread to call "unregister" operation to unregister an
>>> mbean, then any following call (without any time break) to use the mbean
>>> should fail, like "removeNotificationListener", "isRegistered" etc.
>>>
>>> I do see a bug here, if we remove a listener from a non-existing MBeam,
>>> we get "ListenerNotFoundException" at a client side, but get
>>> "InstanceNotFoundException" at server side, I think we should create a
>>> bug, because both implemented the same interface MBeanServerConnection.
>>>     
>>
>> Yes, it is rather inconsistent.
>>
>> -JB-
>>
>>  
>>> Shanliang
>>>
>>> Jaroslav Bachorik wrote:
>>>    
>>>> Looking for review and a sponsor.
>>>>
>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>
>>>> In this issue the timing is the problem. MBeanServer.unregisterMBean()
>>>> fires the "unregister" notification which is sent to the server
>>>> asynchronously. Thus it may happen that the "unregister" notification
>>>> has not been yet processed at the time of invoking
>>>> removeNotificationListener() and the notification listeners hasn't been
>>>> cleaned up leading to the test failure.
>>>>
>>>> There is no synchronization between the client and the server and such
>>>> race condition can occur occasionally. Normally, the execution is fast
>>>> enough to behave like the "unregister" notification is processed
>>>> synchronously with the unregisterMBean() operation but it seems that
>>>> using fastdebug Server VM bits with the -Xcomp option strains the CPU
>>>> enough to make this problem appear.
>>>>
>>>> There is no proper fix for this - the only thing that work is waiting a
>>>> bit longer in the main thread to give the notification processing
>>>> thread
>>>> some time to clean up the listeners.
>>>>
>>>> Regards,
>>>>
>>>> -JB-
>>>>         
>>
>>   
> 
> 


From shanliang.jiang at oracle.com  Wed Jan  9 05:44:22 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Wed, 09 Jan 2013 14:44:22 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50ED6D8E.6070404@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com>
	<50ED6D8E.6070404@oracle.com>
Message-ID: <50ED7436.1020205@oracle.com>

Let's forget the JMX implementation at first. If an MBean is 
unregistered, a user at client side calls "removeNotificationListener" 
on the MBean, what should happen? if the user calls "isRegistered" on 
the MBean, what should happen?

I have done 2 tests, I used only one thread:

1)
......
localServer.unregisterMBean(myMBean);
boolean isRegistered = remoteClientServer.isRegistered(myMBean));

I got isRegistered = false;

2)
......
localServer.unregisterMBean(myMBean);
System.out.println("isRegistered = 
"+remoteClientServer.sRegistered(myMBean));
remoteClientServer.removeNotificationListener(myMBean, listener);

I did not get an exception.

The 1) told that the client could know the MBean was unregistered, then 
the client should throw an exception for the call of 
"removeNotificationListener" in 2).

The test "DeadListenerTest" got passed in some machines because of the 
timeout for waiting a notification. I think its failure just tells a new 
bug.

To set a longer timeout just hides the real bug, and the test might fail 
again one day if running condition is changed and you might need longer 
timeout again.

Shanliang

Jaroslav Bachorik wrote:
> On 01/09/2013 11:08 AM, shanliang wrote:
>   
>> Jaroslav Bachorik wrote:
>>     
>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>  
>>>       
>>>> I still have no idea why the test failed, but I do not see why a longer
>>>> timeout can fix the test. Have you reproduced the problem and tested
>>>> your fix? if yes then possible the long timeout hided a real problem.
>>>>     
>>>>         
>>> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp
>>> switch) and verify that the fix makes the test pass.
>>>
>>> The ClientNotifForwarder scans the notifications for
>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>> appropriate notification listeners in a separate thread. Thus, calling
>>> "removeNotificationListener" on the main thread is prone to racing.
>>>   
>>>       
>> It is true that ClientNotifForwarder scans the notifications for
>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>> appropriate notification listeners in a separate thread. This is for a
>> client connection to do clean if a user never calls
>> removeNotificationListener.
>>
>> But calling directly removeNotificationListener from a client should
>> still get exception if the clean has not been done. As I said, if the
>> client checked and found the listener was still there, then the client
>> sent a request to its server to remove the listener at server side, the
>> server should find that the MBean in question was not registered, so the
>> server should throw an exception. The bug might be here.
>>     
>
> This won't work. The server side listeners are removed upon receiving
> the "unregistered" notification which is delivered from the
> ClientNotificationForwarder and it may have not run yet (since it runs
> in a separate executor thread). The result is that the attempt to remove
> the notification listener on the server will succeed as well failing the
> test subsequently.
>
> -JB-
>
>   
>> Shanliang
>>     
>>>  
>>>       
>>>> The timeout you made longer was used to wait a notification which should
>>>> never arrive.
>>>>     
>>>>         
>>> Well, it can be used to allow more time to process the "unregister"
>>> notification, too.
>>>
>>> When I think more of this I am more inclined to fix the race condition.
>>> An updated webrev will follow.
>>>
>>>  
>>>       
>>>> To remove a listener from a client side, we did:
>>>> 1) at client side, check whether it was added in the client side
>>>> 2) at server side, check whether the MBean in question was registered in
>>>> the MBeanServer (!!!)
>>>> 3) at server side, check whether the listener was added.
>>>>
>>>> So 2) tells that we did not rely on a "unregister" notification. Anyway,
>>>> if you use a SAME thread to call "unregister" operation to unregister an
>>>> mbean, then any following call (without any time break) to use the mbean
>>>> should fail, like "removeNotificationListener", "isRegistered" etc.
>>>>
>>>> I do see a bug here, if we remove a listener from a non-existing MBeam,
>>>> we get "ListenerNotFoundException" at a client side, but get
>>>> "InstanceNotFoundException" at server side, I think we should create a
>>>> bug, because both implemented the same interface MBeanServerConnection.
>>>>     
>>>>         
>>> Yes, it is rather inconsistent.
>>>
>>> -JB-
>>>
>>>  
>>>       
>>>> Shanliang
>>>>
>>>> Jaroslav Bachorik wrote:
>>>>    
>>>>         
>>>>> Looking for review and a sponsor.
>>>>>
>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>
>>>>> In this issue the timing is the problem. MBeanServer.unregisterMBean()
>>>>> fires the "unregister" notification which is sent to the server
>>>>> asynchronously. Thus it may happen that the "unregister" notification
>>>>> has not been yet processed at the time of invoking
>>>>> removeNotificationListener() and the notification listeners hasn't been
>>>>> cleaned up leading to the test failure.
>>>>>
>>>>> There is no synchronization between the client and the server and such
>>>>> race condition can occur occasionally. Normally, the execution is fast
>>>>> enough to behave like the "unregister" notification is processed
>>>>> synchronously with the unregisterMBean() operation but it seems that
>>>>> using fastdebug Server VM bits with the -Xcomp option strains the CPU
>>>>> enough to make this problem appear.
>>>>>
>>>>> There is no proper fix for this - the only thing that work is waiting a
>>>>> bit longer in the main thread to give the notification processing
>>>>> thread
>>>>> some time to clean up the listeners.
>>>>>
>>>>> Regards,
>>>>>
>>>>> -JB-
>>>>>         
>>>>>           
>>>   
>>>       
>>     
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130109/77bcd5e4/attachment-0001.html 

From jaroslav.bachorik at oracle.com  Wed Jan  9 06:00:33 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 09 Jan 2013 15:00:33 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50ED7436.1020205@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com>
	<50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com>
Message-ID: <50ED7801.8080704@oracle.com>

On 01/09/2013 02:44 PM, shanliang wrote:
> Let's forget the JMX implementation at first. If an MBean is
> unregistered, a user at client side calls "removeNotificationListener"
> on the MBean, what should happen? if the user calls "isRegistered" on
> the MBean, what should happen?
> 
> I have done 2 tests, I used only one thread:
> 
> 1)
> ......
> localServer.unregisterMBean(myMBean);
> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
> 
> I got isRegistered = false;
> 
> 2)
> ......
> localServer.unregisterMBean(myMBean);
> System.out.println("isRegistered =
> "+remoteClientServer.sRegistered(myMBean));
> remoteClientServer.removeNotificationListener(myMBean, listener);
> 
> I did not get an exception.
> 
> The 1) told that the client could know the MBean was unregistered, then
> the client should throw an exception for the call of
> "removeNotificationListener" in 2).

Yes, but then it would not test the listener leakage as it was supposed
to test but rather the fact that the client throws the appropriate
exception. The fact that the mbean was unregistered does not necessarily
mean that the listeners were released. The main problem remains - the
listeners are being cleaned-up asynchronously and the clean-up process
might race against the other uses of the JMX API.

> 
> The test "DeadListenerTest" got passed in some machines because of the
> timeout for waiting a notification. I think its failure just tells a new
> bug.
> 
> To set a longer timeout just hides the real bug, and the test might fail
> again one day if running condition is changed and you might need longer
> timeout again.

Yes, I agree with you that extending the timeout just lessens the
likelihood of the race condition and does not prevent it.

> 
> Shanliang
> 
> Jaroslav Bachorik wrote:
>> On 01/09/2013 11:08 AM, shanliang wrote:
>>  
>>> Jaroslav Bachorik wrote:
>>>    
>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>  
>>>>      
>>>>> I still have no idea why the test failed, but I do not see why a
>>>>> longer
>>>>> timeout can fix the test. Have you reproduced the problem and tested
>>>>> your fix? if yes then possible the long timeout hided a real problem.
>>>>>             
>>>> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp
>>>> switch) and verify that the fix makes the test pass.
>>>>
>>>> The ClientNotifForwarder scans the notifications for
>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>> appropriate notification listeners in a separate thread. Thus, calling
>>>> "removeNotificationListener" on the main thread is prone to racing.
>>>>         
>>> It is true that ClientNotifForwarder scans the notifications for
>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>> appropriate notification listeners in a separate thread. This is for a
>>> client connection to do clean if a user never calls
>>> removeNotificationListener.
>>>
>>> But calling directly removeNotificationListener from a client should
>>> still get exception if the clean has not been done. As I said, if the
>>> client checked and found the listener was still there, then the client
>>> sent a request to its server to remove the listener at server side, the
>>> server should find that the MBean in question was not registered, so the
>>> server should throw an exception. The bug might be here.
>>>     
>>
>> This won't work. The server side listeners are removed upon receiving
>> the "unregistered" notification which is delivered from the
>> ClientNotificationForwarder and it may have not run yet (since it runs
>> in a separate executor thread). The result is that the attempt to remove
>> the notification listener on the server will succeed as well failing the
>> test subsequently.
>>
>> -JB-
>>
>>  
>>> Shanliang
>>>    
>>>>  
>>>>      
>>>>> The timeout you made longer was used to wait a notification which
>>>>> should
>>>>> never arrive.
>>>>>             
>>>> Well, it can be used to allow more time to process the "unregister"
>>>> notification, too.
>>>>
>>>> When I think more of this I am more inclined to fix the race condition.
>>>> An updated webrev will follow.
>>>>
>>>>  
>>>>      
>>>>> To remove a listener from a client side, we did:
>>>>> 1) at client side, check whether it was added in the client side
>>>>> 2) at server side, check whether the MBean in question was
>>>>> registered in
>>>>> the MBeanServer (!!!)
>>>>> 3) at server side, check whether the listener was added.
>>>>>
>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>> Anyway,
>>>>> if you use a SAME thread to call "unregister" operation to
>>>>> unregister an
>>>>> mbean, then any following call (without any time break) to use the
>>>>> mbean
>>>>> should fail, like "removeNotificationListener", "isRegistered" etc.
>>>>>
>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>> MBeam,
>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>> "InstanceNotFoundException" at server side, I think we should create a
>>>>> bug, because both implemented the same interface
>>>>> MBeanServerConnection.
>>>>>             
>>>> Yes, it is rather inconsistent.
>>>>
>>>> -JB-
>>>>
>>>>  
>>>>      
>>>>> Shanliang
>>>>>
>>>>> Jaroslav Bachorik wrote:
>>>>>           
>>>>>> Looking for review and a sponsor.
>>>>>>
>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>
>>>>>> In this issue the timing is the problem.
>>>>>> MBeanServer.unregisterMBean()
>>>>>> fires the "unregister" notification which is sent to the server
>>>>>> asynchronously. Thus it may happen that the "unregister" notification
>>>>>> has not been yet processed at the time of invoking
>>>>>> removeNotificationListener() and the notification listeners hasn't
>>>>>> been
>>>>>> cleaned up leading to the test failure.
>>>>>>
>>>>>> There is no synchronization between the client and the server and
>>>>>> such
>>>>>> race condition can occur occasionally. Normally, the execution is
>>>>>> fast
>>>>>> enough to behave like the "unregister" notification is processed
>>>>>> synchronously with the unregisterMBean() operation but it seems that
>>>>>> using fastdebug Server VM bits with the -Xcomp option strains the CPU
>>>>>> enough to make this problem appear.
>>>>>>
>>>>>> There is no proper fix for this - the only thing that work is
>>>>>> waiting a
>>>>>> bit longer in the main thread to give the notification processing
>>>>>> thread
>>>>>> some time to clean up the listeners.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> -JB-
>>>>>>                   
>>>>         
>>>     
>>
>>   
> 
> 


From shanliang.jiang at oracle.com  Wed Jan  9 06:25:51 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Wed, 09 Jan 2013 15:25:51 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50ED7801.8080704@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com>
	<50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com>
	<50ED7801.8080704@oracle.com>
Message-ID: <50ED7DEF.9020108@oracle.com>

Jaroslav Bachorik wrote:
> On 01/09/2013 02:44 PM, shanliang wrote:
>   
>> Let's forget the JMX implementation at first. If an MBean is
>> unregistered, a user at client side calls "removeNotificationListener"
>> on the MBean, what should happen? if the user calls "isRegistered" on
>> the MBean, what should happen?
>>
>> I have done 2 tests, I used only one thread:
>>
>> 1)
>> ......
>> localServer.unregisterMBean(myMBean);
>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>
>> I got isRegistered = false;
>>
>> 2)
>> ......
>> localServer.unregisterMBean(myMBean);
>> System.out.println("isRegistered =
>> "+remoteClientServer.sRegistered(myMBean));
>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>
>> I did not get an exception.
>>
>> The 1) told that the client could know the MBean was unregistered, then
>> the client should throw an exception for the call of
>> "removeNotificationListener" in 2).
>>     
>
> Yes, but then it would not test the listener leakage as it was supposed
> to test but rather the fact that the client throws the appropriate
> exception. The fact that the mbean was unregistered does not necessarily
> mean that the listeners were released. The main problem remains - the
> listeners are being cleaned-up asynchronously and the clean-up process
> might race against the other uses of the JMX API.
>   
client.removeNotificationListener is not a right way here to test 
listener leak, we could use some other ways, for example we keep the 
listener in a weak reference, then after the mbean is removed, the weak 
reference should be empty after some time. Another way is like 
DeadListenerTest does to check whether clean has done at server side: 
use reflection to get the "listenerMap" at server side and make sure it 
is empty, but this need to add a private method to the class 
ClientNotifForwarder.

I think we have 3 things to do here:
1) modify the test to not use removeNotificationListener for testing 
listener leak
2) create a new bug about a client does not throw an exception after an 
mbean is unregistered
3) create a bug about a client does not throw a same exception as at 
server side.

I will do 2) and 3), if you like you can continue 1), it might need to 
do fix also in the JMX implementation.
 
Shanliang
>   
>> The test "DeadListenerTest" got passed in some machines because of the
>> timeout for waiting a notification. I think its failure just tells a new
>> bug.
>>
>> To set a longer timeout just hides the real bug, and the test might fail
>> again one day if running condition is changed and you might need longer
>> timeout again.
>>     
>
> Yes, I agree with you that extending the timeout just lessens the
> likelihood of the race condition and does not prevent it.
>
>   
>> Shanliang
>>
>> Jaroslav Bachorik wrote:
>>     
>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>  
>>>       
>>>> Jaroslav Bachorik wrote:
>>>>    
>>>>         
>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>  
>>>>>      
>>>>>           
>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>> longer
>>>>>> timeout can fix the test. Have you reproduced the problem and tested
>>>>>> your fix? if yes then possible the long timeout hided a real problem.
>>>>>>             
>>>>>>             
>>>>> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp
>>>>> switch) and verify that the fix makes the test pass.
>>>>>
>>>>> The ClientNotifForwarder scans the notifications for
>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>> appropriate notification listeners in a separate thread. Thus, calling
>>>>> "removeNotificationListener" on the main thread is prone to racing.
>>>>>         
>>>>>           
>>>> It is true that ClientNotifForwarder scans the notifications for
>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>> appropriate notification listeners in a separate thread. This is for a
>>>> client connection to do clean if a user never calls
>>>> removeNotificationListener.
>>>>
>>>> But calling directly removeNotificationListener from a client should
>>>> still get exception if the clean has not been done. As I said, if the
>>>> client checked and found the listener was still there, then the client
>>>> sent a request to its server to remove the listener at server side, the
>>>> server should find that the MBean in question was not registered, so the
>>>> server should throw an exception. The bug might be here.
>>>>     
>>>>         
>>> This won't work. The server side listeners are removed upon receiving
>>> the "unregistered" notification which is delivered from the
>>> ClientNotificationForwarder and it may have not run yet (since it runs
>>> in a separate executor thread). The result is that the attempt to remove
>>> the notification listener on the server will succeed as well failing the
>>> test subsequently.
>>>
>>> -JB-
>>>
>>>  
>>>       
>>>> Shanliang
>>>>    
>>>>         
>>>>>  
>>>>>      
>>>>>           
>>>>>> The timeout you made longer was used to wait a notification which
>>>>>> should
>>>>>> never arrive.
>>>>>>             
>>>>>>             
>>>>> Well, it can be used to allow more time to process the "unregister"
>>>>> notification, too.
>>>>>
>>>>> When I think more of this I am more inclined to fix the race condition.
>>>>> An updated webrev will follow.
>>>>>
>>>>>  
>>>>>      
>>>>>           
>>>>>> To remove a listener from a client side, we did:
>>>>>> 1) at client side, check whether it was added in the client side
>>>>>> 2) at server side, check whether the MBean in question was
>>>>>> registered in
>>>>>> the MBeanServer (!!!)
>>>>>> 3) at server side, check whether the listener was added.
>>>>>>
>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>> Anyway,
>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>> unregister an
>>>>>> mbean, then any following call (without any time break) to use the
>>>>>> mbean
>>>>>> should fail, like "removeNotificationListener", "isRegistered" etc.
>>>>>>
>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>> MBeam,
>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>> "InstanceNotFoundException" at server side, I think we should create a
>>>>>> bug, because both implemented the same interface
>>>>>> MBeanServerConnection.
>>>>>>             
>>>>>>             
>>>>> Yes, it is rather inconsistent.
>>>>>
>>>>> -JB-
>>>>>
>>>>>  
>>>>>      
>>>>>           
>>>>>> Shanliang
>>>>>>
>>>>>> Jaroslav Bachorik wrote:
>>>>>>           
>>>>>>             
>>>>>>> Looking for review and a sponsor.
>>>>>>>
>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>
>>>>>>> In this issue the timing is the problem.
>>>>>>> MBeanServer.unregisterMBean()
>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>> asynchronously. Thus it may happen that the "unregister" notification
>>>>>>> has not been yet processed at the time of invoking
>>>>>>> removeNotificationListener() and the notification listeners hasn't
>>>>>>> been
>>>>>>> cleaned up leading to the test failure.
>>>>>>>
>>>>>>> There is no synchronization between the client and the server and
>>>>>>> such
>>>>>>> race condition can occur occasionally. Normally, the execution is
>>>>>>> fast
>>>>>>> enough to behave like the "unregister" notification is processed
>>>>>>> synchronously with the unregisterMBean() operation but it seems that
>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains the CPU
>>>>>>> enough to make this problem appear.
>>>>>>>
>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>> waiting a
>>>>>>> bit longer in the main thread to give the notification processing
>>>>>>> thread
>>>>>>> some time to clean up the listeners.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> -JB-
>>>>>>>                   
>>>>>>>               
>>>>>         
>>>>>           
>>>>     
>>>>         
>>>   
>>>       
>>     
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130109/7e0c4d75/attachment.html 

From shanliang.jiang at oracle.com  Wed Jan  9 06:32:12 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Wed, 09 Jan 2013 15:32:12 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50ED7DEF.9020108@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com>
	<50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com>
	<50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com>
Message-ID: <50ED7F6C.4010301@oracle.com>

shanliang wrote:
> I think we have 3 things to do here:
> 1) modify the test to not use removeNotificationListener for testing 
> listener leak
> 2) create a new bug about a client does not throw an exception after 
> an mbean is unregistered
> 3) create a bug about a client does not throw a same exception as at 
> server side.
>
> I will do 2) and 3), if you like you can continue 1), it might need to 
> do fix also in the JMX implementation.
Oh, 1) does not need to do fix in JMX implementation, just fix the test.

From stuart.marks at oracle.com  Thu Jan 10 00:52:10 2013
From: stuart.marks at oracle.com (Stuart Marks)
Date: Thu, 10 Jan 2013 00:52:10 -0800
Subject: jmx-dev [PATCH] JDK-8005472:
 com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh
 failed on windows
In-Reply-To: <50EAB014.30805@oracle.com>
References: <50E16BA8.40203@oracle.com>
	<682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com>
	<50EAB014.30805@oracle.com>
Message-ID: <50EE813A.1020501@oracle.com>

On 1/7/13 3:23 AM, Jaroslav Bachorik wrote:
> On 01/04/2013 11:37 PM, Kelly O'Hair wrote:
>> I suspect it is not hanging because it does not exist, but that some other windows process has it's hands on it.
>> This is the stdout file from the server being started up right?
>> Could the server from a previous test run be still running?
>
> Exactly. Amy confirmed this and provided a patch which resolves the
> hanging problem.
>
> The update patch is at
> http://cr.openjdk.java.net/~jbachorik/8005472/webrev.01

Hi Jaroslav,

The change to remove the parentheses from around the server program looks 
right. It avoids forking an extra process (at least in some shells) and lets $! 
refer to the actual JVM, not an intermediate shell process. The rm -f from 
Kelly's suggestion is good too.

But there are other things wrong with the script. I don't think they could 
cause hanging, but they could cause the script to fail in unforeseen ways, or 
even to report success incorrectly.

One problem is introduced by the change, where the Server's stderr is also 
redirected into $URL_PATH along with stdout. This means that if the Server 
program reports any errors, they'll get mixed into the URL_PATH file instead of 
appearing in the test log. The URL_PATH file's contents is never reported, so 
these error messages will be invisible.

The exit status of some of the critical commands (such as the compilations) 
isn't checked, so if javac fails for some reason, the test might not report 
failure. Instead, some weird error might or might not be reported later (though 
one will still see the javac errors in the log).

I don't think the sleep at line 80 is necessary, since the client runs 
synchronously and should have exited by this point.

The wait loop checking for the existence of the URL_PATH file doesn't actually 
guarantee that the server is running or has initialized yet. The file is 
actually created by the shell before the Server JVM starts up. Thus, runClient 
might try to read from it before the server has written anything to it. Or, as 
mentioned above, the server might have written some error messages into the 
URL_PATH file instead of the expected contents. Thus, the contents of the 
JMXURL variable can quite possibly be incorrect.

If this occurs, what will happen when the client runs? It may emit some error 
message, and this will be filtered out by the grep pipeline. Thus, HAS_ERRORS 
might end up empty, and the test will report passing, even though everything 
has failed!

For this changeset I'd recommend at a minimum removing the redirection of 
stderr to URL_PATH. If the server fails we'll at least see errors in the test log.

For checking the notification message, is there a way to modify the client to 
report an exit status or throw an exception? Throwing an exception from main() 
will exit the JVM with a nonzero status, so this can be checked more easily 
from the script. I think this is less error-prone than grepping the output for 
a specific error message. The test should fail if there is *any* error; it 
should not succeed if an expected error is absent.

You might consider having jtreg build the client and server classes. This might 
simplify some of the setup. Also, jtreg is meticulous about aborting the test 
if any compilations fail, so it takes care of that for you.

It would be nice if there were a better way to have the client rendezvous with 
the server. I hate to suggest it, but sleeping unconditionally after starting 
the server is probably necessary. Anything more robust probably requires 
rearchitecting the test, though.

Sorry to dump all this on you. But one of the shell-based RMI tests suffers 
from *exactly* the same pathologies. (I have yet to fix it.) Unfortunately, I 
believe that there are a lot of other shell-based tests in the test suite that 
have similar problems. The lesson here is that writing reliable shell tests is 
a lot harder than it seems.

Thanks,

s'marks

From jaroslav.bachorik at oracle.com  Thu Jan 10 00:56:42 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 10 Jan 2013 09:56:42 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50ED7DEF.9020108@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com>
	<50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com>
	<50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com>
Message-ID: <50EE824A.8020106@oracle.com>

On 01/09/2013 03:25 PM, shanliang wrote:
> Jaroslav Bachorik wrote:
>> On 01/09/2013 02:44 PM, shanliang wrote:
>>  
>>> Let's forget the JMX implementation at first. If an MBean is
>>> unregistered, a user at client side calls "removeNotificationListener"
>>> on the MBean, what should happen? if the user calls "isRegistered" on
>>> the MBean, what should happen?
>>>
>>> I have done 2 tests, I used only one thread:
>>>
>>> 1)
>>> ......
>>> localServer.unregisterMBean(myMBean);
>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>
>>> I got isRegistered = false;
>>>
>>> 2)
>>> ......
>>> localServer.unregisterMBean(myMBean);
>>> System.out.println("isRegistered =
>>> "+remoteClientServer.sRegistered(myMBean));
>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>
>>> I did not get an exception.
>>>
>>> The 1) told that the client could know the MBean was unregistered, then
>>> the client should throw an exception for the call of
>>> "removeNotificationListener" in 2).
>>>     
>>
>> Yes, but then it would not test the listener leakage as it was supposed
>> to test but rather the fact that the client throws the appropriate
>> exception. The fact that the mbean was unregistered does not necessarily
>> mean that the listeners were released. The main problem remains - the
>> listeners are being cleaned-up asynchronously and the clean-up process
>> might race against the other uses of the JMX API.
>>   
> client.removeNotificationListener is not a right way here to test
> listener leak, we could use some other ways, for example we keep the
> listener in a weak reference, then after the mbean is removed, the weak
> reference should be empty after some time. Another way is like
> DeadListenerTest does to check whether clean has done at server side:
> use reflection to get the "listenerMap" at server side and make sure it
> is empty, but this need to add a private method to the class
> ClientNotifForwarder.

There will still be problems with timing. You need either to wait for
the GC to kick in to clean up the weak ref. And the listenerMap will not
be purged of the unregistered MBean listeners until the notification is
generated, processed on the ClientNotificationForwarder and forwarded to
the server. So there goes the timing issue again.

The problem is that the "unregisterMBean" operation does not guarantee
that the listeners have been unregistered at the time it returns. So,
one way or the other we will need to wait an arbitrary amount of time
before checking for the memory leak.

-JB-

> 
> I think we have 3 things to do here:
> 1) modify the test to not use removeNotificationListener for testing
> listener leak
> 2) create a new bug about a client does not throw an exception after an
> mbean is unregistered
> 3) create a bug about a client does not throw a same exception as at
> server side.
> 
> I will do 2) and 3), if you like you can continue 1), it might need to
> do fix also in the JMX implementation.
> 
> Shanliang
>>  
>>> The test "DeadListenerTest" got passed in some machines because of the
>>> timeout for waiting a notification. I think its failure just tells a new
>>> bug.
>>>
>>> To set a longer timeout just hides the real bug, and the test might fail
>>> again one day if running condition is changed and you might need longer
>>> timeout again.
>>>     
>>
>> Yes, I agree with you that extending the timeout just lessens the
>> likelihood of the race condition and does not prevent it.
>>
>>  
>>> Shanliang
>>>
>>> Jaroslav Bachorik wrote:
>>>    
>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>  
>>>>      
>>>>> Jaroslav Bachorik wrote:
>>>>>           
>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>  
>>>>>>               
>>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>>> longer
>>>>>>> timeout can fix the test. Have you reproduced the problem and tested
>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>> problem.
>>>>>>>                         
>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp
>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>
>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>> calling
>>>>>> "removeNotificationListener" on the main thread is prone to racing.
>>>>>>                   
>>>>> It is true that ClientNotifForwarder scans the notifications for
>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>> appropriate notification listeners in a separate thread. This is for a
>>>>> client connection to do clean if a user never calls
>>>>> removeNotificationListener.
>>>>>
>>>>> But calling directly removeNotificationListener from a client should
>>>>> still get exception if the clean has not been done. As I said, if the
>>>>> client checked and found the listener was still there, then the client
>>>>> sent a request to its server to remove the listener at server side,
>>>>> the
>>>>> server should find that the MBean in question was not registered,
>>>>> so the
>>>>> server should throw an exception. The bug might be here.
>>>>>             
>>>> This won't work. The server side listeners are removed upon receiving
>>>> the "unregistered" notification which is delivered from the
>>>> ClientNotificationForwarder and it may have not run yet (since it runs
>>>> in a separate executor thread). The result is that the attempt to
>>>> remove
>>>> the notification listener on the server will succeed as well failing
>>>> the
>>>> test subsequently.
>>>>
>>>> -JB-
>>>>
>>>>  
>>>>      
>>>>> Shanliang
>>>>>           
>>>>>>  
>>>>>>               
>>>>>>> The timeout you made longer was used to wait a notification which
>>>>>>> should
>>>>>>> never arrive.
>>>>>>>                         
>>>>>> Well, it can be used to allow more time to process the "unregister"
>>>>>> notification, too.
>>>>>>
>>>>>> When I think more of this I am more inclined to fix the race
>>>>>> condition.
>>>>>> An updated webrev will follow.
>>>>>>
>>>>>>  
>>>>>>               
>>>>>>> To remove a listener from a client side, we did:
>>>>>>> 1) at client side, check whether it was added in the client side
>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>> registered in
>>>>>>> the MBeanServer (!!!)
>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>
>>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>>> Anyway,
>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>> unregister an
>>>>>>> mbean, then any following call (without any time break) to use the
>>>>>>> mbean
>>>>>>> should fail, like "removeNotificationListener", "isRegistered" etc.
>>>>>>>
>>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>>> MBeam,
>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>> create a
>>>>>>> bug, because both implemented the same interface
>>>>>>> MBeanServerConnection.
>>>>>>>                         
>>>>>> Yes, it is rather inconsistent.
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>  
>>>>>>               
>>>>>>> Shanliang
>>>>>>>
>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>                      
>>>>>>>> Looking for review and a sponsor.
>>>>>>>>
>>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>
>>>>>>>> In this issue the timing is the problem.
>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>> notification
>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>> removeNotificationListener() and the notification listeners hasn't
>>>>>>>> been
>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>
>>>>>>>> There is no synchronization between the client and the server and
>>>>>>>> such
>>>>>>>> race condition can occur occasionally. Normally, the execution is
>>>>>>>> fast
>>>>>>>> enough to behave like the "unregister" notification is processed
>>>>>>>> synchronously with the unregisterMBean() operation but it seems
>>>>>>>> that
>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains
>>>>>>>> the CPU
>>>>>>>> enough to make this problem appear.
>>>>>>>>
>>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>>> waiting a
>>>>>>>> bit longer in the main thread to give the notification processing
>>>>>>>> thread
>>>>>>>> some time to clean up the listeners.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>                                 
>>>>>>                   
>>>>>             
>>>>         
>>>     
>>
>>   
> 
> 


From shanliang.jiang at oracle.com  Thu Jan 10 01:05:11 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Thu, 10 Jan 2013 10:05:11 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50EE824A.8020106@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com>
	<50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com>
	<50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com>
	<50EE824A.8020106@oracle.com>
Message-ID: <50EE8447.50901@oracle.com>

Jaroslav Bachorik wrote:
> On 01/09/2013 03:25 PM, shanliang wrote:
>   
>> Jaroslav Bachorik wrote:
>>     
>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>  
>>>       
>>>> Let's forget the JMX implementation at first. If an MBean is
>>>> unregistered, a user at client side calls "removeNotificationListener"
>>>> on the MBean, what should happen? if the user calls "isRegistered" on
>>>> the MBean, what should happen?
>>>>
>>>> I have done 2 tests, I used only one thread:
>>>>
>>>> 1)
>>>> ......
>>>> localServer.unregisterMBean(myMBean);
>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>
>>>> I got isRegistered = false;
>>>>
>>>> 2)
>>>> ......
>>>> localServer.unregisterMBean(myMBean);
>>>> System.out.println("isRegistered =
>>>> "+remoteClientServer.sRegistered(myMBean));
>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>
>>>> I did not get an exception.
>>>>
>>>> The 1) told that the client could know the MBean was unregistered, then
>>>> the client should throw an exception for the call of
>>>> "removeNotificationListener" in 2).
>>>>     
>>>>         
>>> Yes, but then it would not test the listener leakage as it was supposed
>>> to test but rather the fact that the client throws the appropriate
>>> exception. The fact that the mbean was unregistered does not necessarily
>>> mean that the listeners were released. The main problem remains - the
>>> listeners are being cleaned-up asynchronously and the clean-up process
>>> might race against the other uses of the JMX API.
>>>   
>>>       
>> client.removeNotificationListener is not a right way here to test
>> listener leak, we could use some other ways, for example we keep the
>> listener in a weak reference, then after the mbean is removed, the weak
>> reference should be empty after some time. Another way is like
>> DeadListenerTest does to check whether clean has done at server side:
>> use reflection to get the "listenerMap" at server side and make sure it
>> is empty, but this need to add a private method to the class
>> ClientNotifForwarder.
>>     
>
> There will still be problems with timing. You need either to wait for
> the GC to kick in to clean up the weak ref. And the listenerMap will not
> be purged of the unregistered MBean listeners until the notification is
> generated, processed on the ClientNotificationForwarder and forwarded to
> the server. So there goes the timing issue again.
>
> The problem is that the "unregisterMBean" operation does not guarantee
> that the listeners have been unregistered at the time it returns. So,
> one way or the other we will need to wait an arbitrary amount of time
> before checking for the memory leak.
>   
Yes we need to wait, but you can use a cycle like:
        long maxWaitingTime = 3000;
        long startTime = System.currentTimeMillis();
        while ( weakReference.get != null
                && System.currentTimeMillis() < startTime + 
maxWaitingTime) {
            System.gc();
            Thread.sleep(100);
            System.gc();
        }

       if (weakReference.get != null) {
          // failed
       }

Shanliang
> -JB-
>
>   
>> I think we have 3 things to do here:
>> 1) modify the test to not use removeNotificationListener for testing
>> listener leak
>> 2) create a new bug about a client does not throw an exception after an
>> mbean is unregistered
>> 3) create a bug about a client does not throw a same exception as at
>> server side.
>>
>> I will do 2) and 3), if you like you can continue 1), it might need to
>> do fix also in the JMX implementation.
>>
>> Shanliang
>>     
>>>  
>>>       
>>>> The test "DeadListenerTest" got passed in some machines because of the
>>>> timeout for waiting a notification. I think its failure just tells a new
>>>> bug.
>>>>
>>>> To set a longer timeout just hides the real bug, and the test might fail
>>>> again one day if running condition is changed and you might need longer
>>>> timeout again.
>>>>     
>>>>         
>>> Yes, I agree with you that extending the timeout just lessens the
>>> likelihood of the race condition and does not prevent it.
>>>
>>>  
>>>       
>>>> Shanliang
>>>>
>>>> Jaroslav Bachorik wrote:
>>>>    
>>>>         
>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>  
>>>>>      
>>>>>           
>>>>>> Jaroslav Bachorik wrote:
>>>>>>           
>>>>>>             
>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>>>> longer
>>>>>>>> timeout can fix the test. Have you reproduced the problem and tested
>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>> problem.
>>>>>>>>                         
>>>>>>>>                 
>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp
>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>
>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>> calling
>>>>>>> "removeNotificationListener" on the main thread is prone to racing.
>>>>>>>                   
>>>>>>>               
>>>>>> It is true that ClientNotifForwarder scans the notifications for
>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>> appropriate notification listeners in a separate thread. This is for a
>>>>>> client connection to do clean if a user never calls
>>>>>> removeNotificationListener.
>>>>>>
>>>>>> But calling directly removeNotificationListener from a client should
>>>>>> still get exception if the clean has not been done. As I said, if the
>>>>>> client checked and found the listener was still there, then the client
>>>>>> sent a request to its server to remove the listener at server side,
>>>>>> the
>>>>>> server should find that the MBean in question was not registered,
>>>>>> so the
>>>>>> server should throw an exception. The bug might be here.
>>>>>>             
>>>>>>             
>>>>> This won't work. The server side listeners are removed upon receiving
>>>>> the "unregistered" notification which is delivered from the
>>>>> ClientNotificationForwarder and it may have not run yet (since it runs
>>>>> in a separate executor thread). The result is that the attempt to
>>>>> remove
>>>>> the notification listener on the server will succeed as well failing
>>>>> the
>>>>> test subsequently.
>>>>>
>>>>> -JB-
>>>>>
>>>>>  
>>>>>      
>>>>>           
>>>>>> Shanliang
>>>>>>           
>>>>>>             
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> The timeout you made longer was used to wait a notification which
>>>>>>>> should
>>>>>>>> never arrive.
>>>>>>>>                         
>>>>>>>>                 
>>>>>>> Well, it can be used to allow more time to process the "unregister"
>>>>>>> notification, too.
>>>>>>>
>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>> condition.
>>>>>>> An updated webrev will follow.
>>>>>>>
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>> 1) at client side, check whether it was added in the client side
>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>> registered in
>>>>>>>> the MBeanServer (!!!)
>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>
>>>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>>>> Anyway,
>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>> unregister an
>>>>>>>> mbean, then any following call (without any time break) to use the
>>>>>>>> mbean
>>>>>>>> should fail, like "removeNotificationListener", "isRegistered" etc.
>>>>>>>>
>>>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>>>> MBeam,
>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>> create a
>>>>>>>> bug, because both implemented the same interface
>>>>>>>> MBeanServerConnection.
>>>>>>>>                         
>>>>>>>>                 
>>>>>>> Yes, it is rather inconsistent.
>>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> Shanliang
>>>>>>>>
>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>                      
>>>>>>>>                 
>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>
>>>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>
>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>> notification
>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>> removeNotificationListener() and the notification listeners hasn't
>>>>>>>>> been
>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>
>>>>>>>>> There is no synchronization between the client and the server and
>>>>>>>>> such
>>>>>>>>> race condition can occur occasionally. Normally, the execution is
>>>>>>>>> fast
>>>>>>>>> enough to behave like the "unregister" notification is processed
>>>>>>>>> synchronously with the unregisterMBean() operation but it seems
>>>>>>>>> that
>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains
>>>>>>>>> the CPU
>>>>>>>>> enough to make this problem appear.
>>>>>>>>>
>>>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>>>> waiting a
>>>>>>>>> bit longer in the main thread to give the notification processing
>>>>>>>>> thread
>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>                                 
>>>>>>>>>                   
>>>>>>>                   
>>>>>>>               
>>>>>>             
>>>>>>             
>>>>>         
>>>>>           
>>>>     
>>>>         
>>>   
>>>       
>>     
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130110/c9203ef0/attachment-0001.html 

From stuart.marks at oracle.com  Thu Jan 10 01:15:42 2013
From: stuart.marks at oracle.com (Stuart Marks)
Date: Thu, 10 Jan 2013 01:15:42 -0800
Subject: jmx-dev [PATCH] JDK-8005472:
 com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh
 failed on windows
In-Reply-To: <50EE813A.1020501@oracle.com>
References: <50E16BA8.40203@oracle.com>
	<682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com>
	<50EAB014.30805@oracle.com> <50EE813A.1020501@oracle.com>
Message-ID: <50EE86BE.5090101@oracle.com>

On 1/10/13 12:52 AM, Stuart Marks wrote:
> The exit status of some of the critical commands (such as the compilations)
> isn't checked, so if javac fails for some reason, the test might not report
> failure. Instead, some weird error might or might not be reported later (though
> one will still see the javac errors in the log).

Adding

	set -e

near the top of the script will enable a feature where the script will exit if 
any command gives a nonzero exit status. This avoids having to do a lot of 
tedious error checking of commands that just "do stuff" (like mkdir, rm, javac) 
but beware, some commands give a non-zero exit status somewhat unexpectedly, 
like grep.

s'marks


From jaroslav.bachorik at oracle.com  Thu Jan 10 01:34:36 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 10 Jan 2013 10:34:36 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50EE8447.50901@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com>
	<50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com>
	<50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com>
	<50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com>
Message-ID: <50EE8B2C.9030900@oracle.com>

On 01/10/2013 10:05 AM, shanliang wrote:
> Jaroslav Bachorik wrote:
>> On 01/09/2013 03:25 PM, shanliang wrote:
>>  
>>> Jaroslav Bachorik wrote:
>>>    
>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>  
>>>>      
>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>> unregistered, a user at client side calls "removeNotificationListener"
>>>>> on the MBean, what should happen? if the user calls "isRegistered" on
>>>>> the MBean, what should happen?
>>>>>
>>>>> I have done 2 tests, I used only one thread:
>>>>>
>>>>> 1)
>>>>> ......
>>>>> localServer.unregisterMBean(myMBean);
>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>
>>>>> I got isRegistered = false;
>>>>>
>>>>> 2)
>>>>> ......
>>>>> localServer.unregisterMBean(myMBean);
>>>>> System.out.println("isRegistered =
>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>
>>>>> I did not get an exception.
>>>>>
>>>>> The 1) told that the client could know the MBean was unregistered,
>>>>> then
>>>>> the client should throw an exception for the call of
>>>>> "removeNotificationListener" in 2).
>>>>>             
>>>> Yes, but then it would not test the listener leakage as it was supposed
>>>> to test but rather the fact that the client throws the appropriate
>>>> exception. The fact that the mbean was unregistered does not
>>>> necessarily
>>>> mean that the listeners were released. The main problem remains - the
>>>> listeners are being cleaned-up asynchronously and the clean-up process
>>>> might race against the other uses of the JMX API.
>>>>         
>>> client.removeNotificationListener is not a right way here to test
>>> listener leak, we could use some other ways, for example we keep the
>>> listener in a weak reference, then after the mbean is removed, the weak
>>> reference should be empty after some time. Another way is like
>>> DeadListenerTest does to check whether clean has done at server side:
>>> use reflection to get the "listenerMap" at server side and make sure it
>>> is empty, but this need to add a private method to the class
>>> ClientNotifForwarder.
>>>     
>>
>> There will still be problems with timing. You need either to wait for
>> the GC to kick in to clean up the weak ref. And the listenerMap will not
>> be purged of the unregistered MBean listeners until the notification is
>> generated, processed on the ClientNotificationForwarder and forwarded to
>> the server. So there goes the timing issue again.
>>
>> The problem is that the "unregisterMBean" operation does not guarantee
>> that the listeners have been unregistered at the time it returns. So,
>> one way or the other we will need to wait an arbitrary amount of time
>> before checking for the memory leak.
>>   
> Yes we need to wait, but you can use a cycle like:
>        long maxWaitingTime = 3000;
>        long startTime = System.currentTimeMillis();
>        while ( weakReference.get != null
>                && System.currentTimeMillis() < startTime +
> maxWaitingTime) {
>            System.gc();
>            Thread.sleep(100);
>            System.gc();
>        }
> 
>       if (weakReference.get != null) {
>          // failed
>       }
> 

Sounds reasonable. I'll update the test.

-JB-

> Shanliang
>> -JB-
>>
>>  
>>> I think we have 3 things to do here:
>>> 1) modify the test to not use removeNotificationListener for testing
>>> listener leak
>>> 2) create a new bug about a client does not throw an exception after an
>>> mbean is unregistered
>>> 3) create a bug about a client does not throw a same exception as at
>>> server side.
>>>
>>> I will do 2) and 3), if you like you can continue 1), it might need to
>>> do fix also in the JMX implementation.
>>>
>>> Shanliang
>>>    
>>>>  
>>>>      
>>>>> The test "DeadListenerTest" got passed in some machines because of the
>>>>> timeout for waiting a notification. I think its failure just tells
>>>>> a new
>>>>> bug.
>>>>>
>>>>> To set a longer timeout just hides the real bug, and the test might
>>>>> fail
>>>>> again one day if running condition is changed and you might need
>>>>> longer
>>>>> timeout again.
>>>>>             
>>>> Yes, I agree with you that extending the timeout just lessens the
>>>> likelihood of the race condition and does not prevent it.
>>>>
>>>>  
>>>>      
>>>>> Shanliang
>>>>>
>>>>> Jaroslav Bachorik wrote:
>>>>>           
>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>  
>>>>>>               
>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>                      
>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>>>>> longer
>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>> tested
>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>> problem.
>>>>>>>>>                                         
>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>> -Xcomp
>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>
>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>> calling
>>>>>>>> "removeNotificationListener" on the main thread is prone to racing.
>>>>>>>>                                 
>>>>>>> It is true that ClientNotifForwarder scans the notifications for
>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>>> appropriate notification listeners in a separate thread. This is
>>>>>>> for a
>>>>>>> client connection to do clean if a user never calls
>>>>>>> removeNotificationListener.
>>>>>>>
>>>>>>> But calling directly removeNotificationListener from a client should
>>>>>>> still get exception if the clean has not been done. As I said, if
>>>>>>> the
>>>>>>> client checked and found the listener was still there, then the
>>>>>>> client
>>>>>>> sent a request to its server to remove the listener at server side,
>>>>>>> the
>>>>>>> server should find that the MBean in question was not registered,
>>>>>>> so the
>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>                         
>>>>>> This won't work. The server side listeners are removed upon receiving
>>>>>> the "unregistered" notification which is delivered from the
>>>>>> ClientNotificationForwarder and it may have not run yet (since it
>>>>>> runs
>>>>>> in a separate executor thread). The result is that the attempt to
>>>>>> remove
>>>>>> the notification listener on the server will succeed as well failing
>>>>>> the
>>>>>> test subsequently.
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>  
>>>>>>               
>>>>>>> Shanliang
>>>>>>>                      
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> The timeout you made longer was used to wait a notification which
>>>>>>>>> should
>>>>>>>>> never arrive.
>>>>>>>>>                                         
>>>>>>>> Well, it can be used to allow more time to process the "unregister"
>>>>>>>> notification, too.
>>>>>>>>
>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>> condition.
>>>>>>>> An updated webrev will follow.
>>>>>>>>
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>> 1) at client side, check whether it was added in the client side
>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>> registered in
>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>
>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>>>>> Anyway,
>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>> unregister an
>>>>>>>>> mbean, then any following call (without any time break) to use the
>>>>>>>>> mbean
>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered"
>>>>>>>>> etc.
>>>>>>>>>
>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>>>>> MBeam,
>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>> create a
>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>> MBeanServerConnection.
>>>>>>>>>                                         
>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> Shanliang
>>>>>>>>>
>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>                                     
>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>
>>>>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>
>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>> notification
>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>> hasn't
>>>>>>>>>> been
>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>
>>>>>>>>>> There is no synchronization between the client and the server and
>>>>>>>>>> such
>>>>>>>>>> race condition can occur occasionally. Normally, the execution is
>>>>>>>>>> fast
>>>>>>>>>> enough to behave like the "unregister" notification is processed
>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems
>>>>>>>>>> that
>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains
>>>>>>>>>> the CPU
>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>
>>>>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>>>>> waiting a
>>>>>>>>>> bit longer in the main thread to give the notification processing
>>>>>>>>>> thread
>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>                                                   
>>>>>>>>                                 
>>>>>>>                         
>>>>>>                   
>>>>>             
>>>>         
>>>     
>>
>>   
> 
> 


From jaroslav.bachorik at oracle.com  Thu Jan 10 03:41:44 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 10 Jan 2013 12:41:44 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50EE8447.50901@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com>
	<50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com>
	<50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com>
	<50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com>
Message-ID: <50EEA8F8.7090007@oracle.com>

On 01/10/2013 10:05 AM, shanliang wrote:
> Jaroslav Bachorik wrote:
>> On 01/09/2013 03:25 PM, shanliang wrote:
>>  
>>> Jaroslav Bachorik wrote:
>>>    
>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>  
>>>>      
>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>> unregistered, a user at client side calls "removeNotificationListener"
>>>>> on the MBean, what should happen? if the user calls "isRegistered" on
>>>>> the MBean, what should happen?
>>>>>
>>>>> I have done 2 tests, I used only one thread:
>>>>>
>>>>> 1)
>>>>> ......
>>>>> localServer.unregisterMBean(myMBean);
>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>
>>>>> I got isRegistered = false;
>>>>>
>>>>> 2)
>>>>> ......
>>>>> localServer.unregisterMBean(myMBean);
>>>>> System.out.println("isRegistered =
>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>
>>>>> I did not get an exception.
>>>>>
>>>>> The 1) told that the client could know the MBean was unregistered,
>>>>> then
>>>>> the client should throw an exception for the call of
>>>>> "removeNotificationListener" in 2).
>>>>>             
>>>> Yes, but then it would not test the listener leakage as it was supposed
>>>> to test but rather the fact that the client throws the appropriate
>>>> exception. The fact that the mbean was unregistered does not
>>>> necessarily
>>>> mean that the listeners were released. The main problem remains - the
>>>> listeners are being cleaned-up asynchronously and the clean-up process
>>>> might race against the other uses of the JMX API.
>>>>         
>>> client.removeNotificationListener is not a right way here to test
>>> listener leak, we could use some other ways, for example we keep the
>>> listener in a weak reference, then after the mbean is removed, the weak
>>> reference should be empty after some time. Another way is like
>>> DeadListenerTest does to check whether clean has done at server side:
>>> use reflection to get the "listenerMap" at server side and make sure it
>>> is empty, but this need to add a private method to the class
>>> ClientNotifForwarder.
>>>     
>>
>> There will still be problems with timing. You need either to wait for
>> the GC to kick in to clean up the weak ref. And the listenerMap will not
>> be purged of the unregistered MBean listeners until the notification is
>> generated, processed on the ClientNotificationForwarder and forwarded to
>> the server. So there goes the timing issue again.
>>
>> The problem is that the "unregisterMBean" operation does not guarantee
>> that the listeners have been unregistered at the time it returns. So,
>> one way or the other we will need to wait an arbitrary amount of time
>> before checking for the memory leak.
>>   
> Yes we need to wait, but you can use a cycle like:
>        long maxWaitingTime = 3000;
>        long startTime = System.currentTimeMillis();
>        while ( weakReference.get != null
>                && System.currentTimeMillis() < startTime +
> maxWaitingTime) {
>            System.gc();
>            Thread.sleep(100);
>            System.gc();
>        }
> 
>       if (weakReference.get != null) {
>          // failed
>       }

Still you need an arbitrary timeout which might be reached under extreme
conditions making this test to fail intermittently. But I'd say that's
the nature of tests for memory leak fixes, due to the unpredictable
nature of the GC runs. Unless you take a heap dump and do a reachability
analysis you can not be sure whether a reference is dangling somwehwere
or it just hasn't been collected yet :/

-JB-

> 
> Shanliang
>> -JB-
>>
>>  
>>> I think we have 3 things to do here:
>>> 1) modify the test to not use removeNotificationListener for testing
>>> listener leak
>>> 2) create a new bug about a client does not throw an exception after an
>>> mbean is unregistered
>>> 3) create a bug about a client does not throw a same exception as at
>>> server side.
>>>
>>> I will do 2) and 3), if you like you can continue 1), it might need to
>>> do fix also in the JMX implementation.
>>>
>>> Shanliang
>>>    
>>>>  
>>>>      
>>>>> The test "DeadListenerTest" got passed in some machines because of the
>>>>> timeout for waiting a notification. I think its failure just tells
>>>>> a new
>>>>> bug.
>>>>>
>>>>> To set a longer timeout just hides the real bug, and the test might
>>>>> fail
>>>>> again one day if running condition is changed and you might need
>>>>> longer
>>>>> timeout again.
>>>>>             
>>>> Yes, I agree with you that extending the timeout just lessens the
>>>> likelihood of the race condition and does not prevent it.
>>>>
>>>>  
>>>>      
>>>>> Shanliang
>>>>>
>>>>> Jaroslav Bachorik wrote:
>>>>>           
>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>  
>>>>>>               
>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>                      
>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>>>>> longer
>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>> tested
>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>> problem.
>>>>>>>>>                                         
>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>> -Xcomp
>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>
>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>> calling
>>>>>>>> "removeNotificationListener" on the main thread is prone to racing.
>>>>>>>>                                 
>>>>>>> It is true that ClientNotifForwarder scans the notifications for
>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>>> appropriate notification listeners in a separate thread. This is
>>>>>>> for a
>>>>>>> client connection to do clean if a user never calls
>>>>>>> removeNotificationListener.
>>>>>>>
>>>>>>> But calling directly removeNotificationListener from a client should
>>>>>>> still get exception if the clean has not been done. As I said, if
>>>>>>> the
>>>>>>> client checked and found the listener was still there, then the
>>>>>>> client
>>>>>>> sent a request to its server to remove the listener at server side,
>>>>>>> the
>>>>>>> server should find that the MBean in question was not registered,
>>>>>>> so the
>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>                         
>>>>>> This won't work. The server side listeners are removed upon receiving
>>>>>> the "unregistered" notification which is delivered from the
>>>>>> ClientNotificationForwarder and it may have not run yet (since it
>>>>>> runs
>>>>>> in a separate executor thread). The result is that the attempt to
>>>>>> remove
>>>>>> the notification listener on the server will succeed as well failing
>>>>>> the
>>>>>> test subsequently.
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>  
>>>>>>               
>>>>>>> Shanliang
>>>>>>>                      
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> The timeout you made longer was used to wait a notification which
>>>>>>>>> should
>>>>>>>>> never arrive.
>>>>>>>>>                                         
>>>>>>>> Well, it can be used to allow more time to process the "unregister"
>>>>>>>> notification, too.
>>>>>>>>
>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>> condition.
>>>>>>>> An updated webrev will follow.
>>>>>>>>
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>> 1) at client side, check whether it was added in the client side
>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>> registered in
>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>
>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>>>>> Anyway,
>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>> unregister an
>>>>>>>>> mbean, then any following call (without any time break) to use the
>>>>>>>>> mbean
>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered"
>>>>>>>>> etc.
>>>>>>>>>
>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>>>>> MBeam,
>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>> create a
>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>> MBeanServerConnection.
>>>>>>>>>                                         
>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> Shanliang
>>>>>>>>>
>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>                                     
>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>
>>>>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>
>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>> notification
>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>> hasn't
>>>>>>>>>> been
>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>
>>>>>>>>>> There is no synchronization between the client and the server and
>>>>>>>>>> such
>>>>>>>>>> race condition can occur occasionally. Normally, the execution is
>>>>>>>>>> fast
>>>>>>>>>> enough to behave like the "unregister" notification is processed
>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems
>>>>>>>>>> that
>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains
>>>>>>>>>> the CPU
>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>
>>>>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>>>>> waiting a
>>>>>>>>>> bit longer in the main thread to give the notification processing
>>>>>>>>>> thread
>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>                                                   
>>>>>>>>                                 
>>>>>>>                         
>>>>>>                   
>>>>>             
>>>>         
>>>     
>>
>>   
> 
> 


From shanliang.jiang at oracle.com  Thu Jan 10 03:53:10 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Thu, 10 Jan 2013 12:53:10 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50EEA8F8.7090007@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com>
	<50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com>
	<50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com>
	<50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com>
	<50EEA8F8.7090007@oracle.com>
Message-ID: <50EEABA6.6010203@oracle.com>

Instead to wait GC, you can also to wait the 
MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive 
it, then your listener must be removed too. Of course this solution is 
implementation dependent, but the test is already implementation dependent.

Shanliang


Jaroslav Bachorik wrote:
> On 01/10/2013 10:05 AM, shanliang wrote:
>   
>> Jaroslav Bachorik wrote:
>>     
>>> On 01/09/2013 03:25 PM, shanliang wrote:
>>>  
>>>       
>>>> Jaroslav Bachorik wrote:
>>>>    
>>>>         
>>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>>  
>>>>>      
>>>>>           
>>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>>> unregistered, a user at client side calls "removeNotificationListener"
>>>>>> on the MBean, what should happen? if the user calls "isRegistered" on
>>>>>> the MBean, what should happen?
>>>>>>
>>>>>> I have done 2 tests, I used only one thread:
>>>>>>
>>>>>> 1)
>>>>>> ......
>>>>>> localServer.unregisterMBean(myMBean);
>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>>
>>>>>> I got isRegistered = false;
>>>>>>
>>>>>> 2)
>>>>>> ......
>>>>>> localServer.unregisterMBean(myMBean);
>>>>>> System.out.println("isRegistered =
>>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>>
>>>>>> I did not get an exception.
>>>>>>
>>>>>> The 1) told that the client could know the MBean was unregistered,
>>>>>> then
>>>>>> the client should throw an exception for the call of
>>>>>> "removeNotificationListener" in 2).
>>>>>>             
>>>>>>             
>>>>> Yes, but then it would not test the listener leakage as it was supposed
>>>>> to test but rather the fact that the client throws the appropriate
>>>>> exception. The fact that the mbean was unregistered does not
>>>>> necessarily
>>>>> mean that the listeners were released. The main problem remains - the
>>>>> listeners are being cleaned-up asynchronously and the clean-up process
>>>>> might race against the other uses of the JMX API.
>>>>>         
>>>>>           
>>>> client.removeNotificationListener is not a right way here to test
>>>> listener leak, we could use some other ways, for example we keep the
>>>> listener in a weak reference, then after the mbean is removed, the weak
>>>> reference should be empty after some time. Another way is like
>>>> DeadListenerTest does to check whether clean has done at server side:
>>>> use reflection to get the "listenerMap" at server side and make sure it
>>>> is empty, but this need to add a private method to the class
>>>> ClientNotifForwarder.
>>>>     
>>>>         
>>> There will still be problems with timing. You need either to wait for
>>> the GC to kick in to clean up the weak ref. And the listenerMap will not
>>> be purged of the unregistered MBean listeners until the notification is
>>> generated, processed on the ClientNotificationForwarder and forwarded to
>>> the server. So there goes the timing issue again.
>>>
>>> The problem is that the "unregisterMBean" operation does not guarantee
>>> that the listeners have been unregistered at the time it returns. So,
>>> one way or the other we will need to wait an arbitrary amount of time
>>> before checking for the memory leak.
>>>   
>>>       
>> Yes we need to wait, but you can use a cycle like:
>>        long maxWaitingTime = 3000;
>>        long startTime = System.currentTimeMillis();
>>        while ( weakReference.get != null
>>                && System.currentTimeMillis() < startTime +
>> maxWaitingTime) {
>>            System.gc();
>>            Thread.sleep(100);
>>            System.gc();
>>        }
>>
>>       if (weakReference.get != null) {
>>          // failed
>>       }
>>     
>
> Still you need an arbitrary timeout which might be reached under extreme
> conditions making this test to fail intermittently. But I'd say that's
> the nature of tests for memory leak fixes, due to the unpredictable
> nature of the GC runs. Unless you take a heap dump and do a reachability
> analysis you can not be sure whether a reference is dangling somwehwere
> or it just hasn't been collected yet :/
>
> -JB-
>
>   
>> Shanliang
>>     
>>> -JB-
>>>
>>>  
>>>       
>>>> I think we have 3 things to do here:
>>>> 1) modify the test to not use removeNotificationListener for testing
>>>> listener leak
>>>> 2) create a new bug about a client does not throw an exception after an
>>>> mbean is unregistered
>>>> 3) create a bug about a client does not throw a same exception as at
>>>> server side.
>>>>
>>>> I will do 2) and 3), if you like you can continue 1), it might need to
>>>> do fix also in the JMX implementation.
>>>>
>>>> Shanliang
>>>>    
>>>>         
>>>>>  
>>>>>      
>>>>>           
>>>>>> The test "DeadListenerTest" got passed in some machines because of the
>>>>>> timeout for waiting a notification. I think its failure just tells
>>>>>> a new
>>>>>> bug.
>>>>>>
>>>>>> To set a longer timeout just hides the real bug, and the test might
>>>>>> fail
>>>>>> again one day if running condition is changed and you might need
>>>>>> longer
>>>>>> timeout again.
>>>>>>             
>>>>>>             
>>>>> Yes, I agree with you that extending the timeout just lessens the
>>>>> likelihood of the race condition and does not prevent it.
>>>>>
>>>>>  
>>>>>      
>>>>>           
>>>>>> Shanliang
>>>>>>
>>>>>> Jaroslav Bachorik wrote:
>>>>>>           
>>>>>>             
>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>                      
>>>>>>>>                 
>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>>  
>>>>>>>>>                            
>>>>>>>>>                   
>>>>>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>>>>>> longer
>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>>> tested
>>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>>> problem.
>>>>>>>>>>                                         
>>>>>>>>>>                     
>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>>> -Xcomp
>>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>>
>>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>>> calling
>>>>>>>>> "removeNotificationListener" on the main thread is prone to racing.
>>>>>>>>>                                 
>>>>>>>>>                   
>>>>>>>> It is true that ClientNotifForwarder scans the notifications for
>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>>>> appropriate notification listeners in a separate thread. This is
>>>>>>>> for a
>>>>>>>> client connection to do clean if a user never calls
>>>>>>>> removeNotificationListener.
>>>>>>>>
>>>>>>>> But calling directly removeNotificationListener from a client should
>>>>>>>> still get exception if the clean has not been done. As I said, if
>>>>>>>> the
>>>>>>>> client checked and found the listener was still there, then the
>>>>>>>> client
>>>>>>>> sent a request to its server to remove the listener at server side,
>>>>>>>> the
>>>>>>>> server should find that the MBean in question was not registered,
>>>>>>>> so the
>>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>>                         
>>>>>>>>                 
>>>>>>> This won't work. The server side listeners are removed upon receiving
>>>>>>> the "unregistered" notification which is delivered from the
>>>>>>> ClientNotificationForwarder and it may have not run yet (since it
>>>>>>> runs
>>>>>>> in a separate executor thread). The result is that the attempt to
>>>>>>> remove
>>>>>>> the notification listener on the server will succeed as well failing
>>>>>>> the
>>>>>>> test subsequently.
>>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> Shanliang
>>>>>>>>                      
>>>>>>>>                 
>>>>>>>>>  
>>>>>>>>>                            
>>>>>>>>>                   
>>>>>>>>>> The timeout you made longer was used to wait a notification which
>>>>>>>>>> should
>>>>>>>>>> never arrive.
>>>>>>>>>>                                         
>>>>>>>>>>                     
>>>>>>>>> Well, it can be used to allow more time to process the "unregister"
>>>>>>>>> notification, too.
>>>>>>>>>
>>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>>> condition.
>>>>>>>>> An updated webrev will follow.
>>>>>>>>>
>>>>>>>>>  
>>>>>>>>>                            
>>>>>>>>>                   
>>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>>> 1) at client side, check whether it was added in the client side
>>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>>> registered in
>>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>>
>>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>>>>>> Anyway,
>>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>>> unregister an
>>>>>>>>>> mbean, then any following call (without any time break) to use the
>>>>>>>>>> mbean
>>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered"
>>>>>>>>>> etc.
>>>>>>>>>>
>>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>>>>>> MBeam,
>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>>> create a
>>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>>> MBeanServerConnection.
>>>>>>>>>>                                         
>>>>>>>>>>                     
>>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>>  
>>>>>>>>>                            
>>>>>>>>>                   
>>>>>>>>>> Shanliang
>>>>>>>>>>
>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>                                     
>>>>>>>>>>                     
>>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>>
>>>>>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>>
>>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>>> notification
>>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>>> hasn't
>>>>>>>>>>> been
>>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>>
>>>>>>>>>>> There is no synchronization between the client and the server and
>>>>>>>>>>> such
>>>>>>>>>>> race condition can occur occasionally. Normally, the execution is
>>>>>>>>>>> fast
>>>>>>>>>>> enough to behave like the "unregister" notification is processed
>>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems
>>>>>>>>>>> that
>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains
>>>>>>>>>>> the CPU
>>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>>
>>>>>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>>>>>> waiting a
>>>>>>>>>>> bit longer in the main thread to give the notification processing
>>>>>>>>>>> thread
>>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>                                                   
>>>>>>>>>>>                       
>>>>>>>>>                                 
>>>>>>>>>                   
>>>>>>>>                         
>>>>>>>>                 
>>>>>>>                   
>>>>>>>               
>>>>>>             
>>>>>>             
>>>>>         
>>>>>           
>>>>     
>>>>         
>>>   
>>>       
>>     
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130110/c2e94c85/attachment-0001.html 

From jaroslav.bachorik at oracle.com  Thu Jan 10 04:09:04 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 10 Jan 2013 13:09:04 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50EEABA6.6010203@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com>
	<50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com>
	<50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com>
	<50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com>
	<50EEA8F8.7090007@oracle.com> <50EEABA6.6010203@oracle.com>
Message-ID: <50EEAF60.7040801@oracle.com>

On 01/10/2013 12:53 PM, shanliang wrote:
> Instead to wait GC, you can also to wait the
> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive
> it, then your listener must be removed too. Of course this solution is

The problem is that the *NotificationForwarder implementations swallow
this kind of notification and just perform the cleanup. No other
listener will ever receive this notification.

The "unregisterMBean" operation's semantics is not clearly defined.
Intuitively, when unregistering an MBean all the associated listeners
should be gone before the method returns. But this is not the case -
currently the listeners are sanitized some time after the
"unregisterMBean" operation started, eventually. There is no easy way to
notify the API user that the listeners were removed. I am afraid that in
order to resolve these problems new APIs would need to be introduced and
the whole mechanism of delivering notification should be revisited (as
it was planned for JMX 2.0, anyway).

As for fixing the test - checking the weak references works fine as well
as increasing the timeout. They both can fail when the system is
extremely busy but the GC based solution will be in general faster than
the one with increased timeout.

-JB-

> implementation dependent, but the test is already implementation dependent.
> 
> Shanliang
> 
> 
> Jaroslav Bachorik wrote:
>> On 01/10/2013 10:05 AM, shanliang wrote:
>>  
>>> Jaroslav Bachorik wrote:
>>>    
>>>> On 01/09/2013 03:25 PM, shanliang wrote:
>>>>  
>>>>      
>>>>> Jaroslav Bachorik wrote:
>>>>>           
>>>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>>>  
>>>>>>               
>>>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>>>> unregistered, a user at client side calls
>>>>>>> "removeNotificationListener"
>>>>>>> on the MBean, what should happen? if the user calls
>>>>>>> "isRegistered" on
>>>>>>> the MBean, what should happen?
>>>>>>>
>>>>>>> I have done 2 tests, I used only one thread:
>>>>>>>
>>>>>>> 1)
>>>>>>> ......
>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>>>
>>>>>>> I got isRegistered = false;
>>>>>>>
>>>>>>> 2)
>>>>>>> ......
>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>> System.out.println("isRegistered =
>>>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>>>
>>>>>>> I did not get an exception.
>>>>>>>
>>>>>>> The 1) told that the client could know the MBean was unregistered,
>>>>>>> then
>>>>>>> the client should throw an exception for the call of
>>>>>>> "removeNotificationListener" in 2).
>>>>>>>                         
>>>>>> Yes, but then it would not test the listener leakage as it was
>>>>>> supposed
>>>>>> to test but rather the fact that the client throws the appropriate
>>>>>> exception. The fact that the mbean was unregistered does not
>>>>>> necessarily
>>>>>> mean that the listeners were released. The main problem remains - the
>>>>>> listeners are being cleaned-up asynchronously and the clean-up
>>>>>> process
>>>>>> might race against the other uses of the JMX API.
>>>>>>                   
>>>>> client.removeNotificationListener is not a right way here to test
>>>>> listener leak, we could use some other ways, for example we keep the
>>>>> listener in a weak reference, then after the mbean is removed, the
>>>>> weak
>>>>> reference should be empty after some time. Another way is like
>>>>> DeadListenerTest does to check whether clean has done at server side:
>>>>> use reflection to get the "listenerMap" at server side and make
>>>>> sure it
>>>>> is empty, but this need to add a private method to the class
>>>>> ClientNotifForwarder.
>>>>>             
>>>> There will still be problems with timing. You need either to wait for
>>>> the GC to kick in to clean up the weak ref. And the listenerMap will
>>>> not
>>>> be purged of the unregistered MBean listeners until the notification is
>>>> generated, processed on the ClientNotificationForwarder and
>>>> forwarded to
>>>> the server. So there goes the timing issue again.
>>>>
>>>> The problem is that the "unregisterMBean" operation does not guarantee
>>>> that the listeners have been unregistered at the time it returns. So,
>>>> one way or the other we will need to wait an arbitrary amount of time
>>>> before checking for the memory leak.
>>>>         
>>> Yes we need to wait, but you can use a cycle like:
>>>        long maxWaitingTime = 3000;
>>>        long startTime = System.currentTimeMillis();
>>>        while ( weakReference.get != null
>>>                && System.currentTimeMillis() < startTime +
>>> maxWaitingTime) {
>>>            System.gc();
>>>            Thread.sleep(100);
>>>            System.gc();
>>>        }
>>>
>>>       if (weakReference.get != null) {
>>>          // failed
>>>       }
>>>     
>>
>> Still you need an arbitrary timeout which might be reached under extreme
>> conditions making this test to fail intermittently. But I'd say that's
>> the nature of tests for memory leak fixes, due to the unpredictable
>> nature of the GC runs. Unless you take a heap dump and do a reachability
>> analysis you can not be sure whether a reference is dangling somwehwere
>> or it just hasn't been collected yet :/
>>
>> -JB-
>>
>>  
>>> Shanliang
>>>    
>>>> -JB-
>>>>
>>>>  
>>>>      
>>>>> I think we have 3 things to do here:
>>>>> 1) modify the test to not use removeNotificationListener for testing
>>>>> listener leak
>>>>> 2) create a new bug about a client does not throw an exception
>>>>> after an
>>>>> mbean is unregistered
>>>>> 3) create a bug about a client does not throw a same exception as at
>>>>> server side.
>>>>>
>>>>> I will do 2) and 3), if you like you can continue 1), it might need to
>>>>> do fix also in the JMX implementation.
>>>>>
>>>>> Shanliang
>>>>>           
>>>>>>  
>>>>>>               
>>>>>>> The test "DeadListenerTest" got passed in some machines because
>>>>>>> of the
>>>>>>> timeout for waiting a notification. I think its failure just tells
>>>>>>> a new
>>>>>>> bug.
>>>>>>>
>>>>>>> To set a longer timeout just hides the real bug, and the test might
>>>>>>> fail
>>>>>>> again one day if running condition is changed and you might need
>>>>>>> longer
>>>>>>> timeout again.
>>>>>>>                         
>>>>>> Yes, I agree with you that extending the timeout just lessens the
>>>>>> likelihood of the race condition and does not prevent it.
>>>>>>
>>>>>>  
>>>>>>               
>>>>>>> Shanliang
>>>>>>>
>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>                      
>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>                                     
>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>>>  
>>>>>>>>>>                                             
>>>>>>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>>>>>>> longer
>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>>>> tested
>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>>>> problem.
>>>>>>>>>>>                                                             
>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>>>> -Xcomp
>>>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>>>
>>>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and
>>>>>>>>>> removes the
>>>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>>>> calling
>>>>>>>>>> "removeNotificationListener" on the main thread is prone to
>>>>>>>>>> racing.
>>>>>>>>>>                                                   
>>>>>>>>> It is true that ClientNotifForwarder scans the notifications for
>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes
>>>>>>>>> the
>>>>>>>>> appropriate notification listeners in a separate thread. This is
>>>>>>>>> for a
>>>>>>>>> client connection to do clean if a user never calls
>>>>>>>>> removeNotificationListener.
>>>>>>>>>
>>>>>>>>> But calling directly removeNotificationListener from a client
>>>>>>>>> should
>>>>>>>>> still get exception if the clean has not been done. As I said, if
>>>>>>>>> the
>>>>>>>>> client checked and found the listener was still there, then the
>>>>>>>>> client
>>>>>>>>> sent a request to its server to remove the listener at server
>>>>>>>>> side,
>>>>>>>>> the
>>>>>>>>> server should find that the MBean in question was not registered,
>>>>>>>>> so the
>>>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>>>                                         
>>>>>>>> This won't work. The server side listeners are removed upon
>>>>>>>> receiving
>>>>>>>> the "unregistered" notification which is delivered from the
>>>>>>>> ClientNotificationForwarder and it may have not run yet (since it
>>>>>>>> runs
>>>>>>>> in a separate executor thread). The result is that the attempt to
>>>>>>>> remove
>>>>>>>> the notification listener on the server will succeed as well
>>>>>>>> failing
>>>>>>>> the
>>>>>>>> test subsequently.
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> Shanliang
>>>>>>>>>                                     
>>>>>>>>>>  
>>>>>>>>>>                                             
>>>>>>>>>>> The timeout you made longer was used to wait a notification
>>>>>>>>>>> which
>>>>>>>>>>> should
>>>>>>>>>>> never arrive.
>>>>>>>>>>>                                                             
>>>>>>>>>> Well, it can be used to allow more time to process the
>>>>>>>>>> "unregister"
>>>>>>>>>> notification, too.
>>>>>>>>>>
>>>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>>>> condition.
>>>>>>>>>> An updated webrev will follow.
>>>>>>>>>>
>>>>>>>>>>  
>>>>>>>>>>                                             
>>>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>>>> 1) at client side, check whether it was added in the client side
>>>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>>>> registered in
>>>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>>>
>>>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>>>>>>> Anyway,
>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>>>> unregister an
>>>>>>>>>>> mbean, then any following call (without any time break) to
>>>>>>>>>>> use the
>>>>>>>>>>> mbean
>>>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered"
>>>>>>>>>>> etc.
>>>>>>>>>>>
>>>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>>>>>>> MBeam,
>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>>>> create a
>>>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>>>> MBeanServerConnection.
>>>>>>>>>>>                                                             
>>>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>>>>>>  
>>>>>>>>>>                                             
>>>>>>>>>>> Shanliang
>>>>>>>>>>>
>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>                                                        
>>>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>>>
>>>>>>>>>>>> Webrev at
>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>>>
>>>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>>>> notification
>>>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>>>> hasn't
>>>>>>>>>>>> been
>>>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>>>
>>>>>>>>>>>> There is no synchronization between the client and the
>>>>>>>>>>>> server and
>>>>>>>>>>>> such
>>>>>>>>>>>> race condition can occur occasionally. Normally, the
>>>>>>>>>>>> execution is
>>>>>>>>>>>> fast
>>>>>>>>>>>> enough to behave like the "unregister" notification is
>>>>>>>>>>>> processed
>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems
>>>>>>>>>>>> that
>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains
>>>>>>>>>>>> the CPU
>>>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>>>
>>>>>>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>>>>>>> waiting a
>>>>>>>>>>>> bit longer in the main thread to give the notification
>>>>>>>>>>>> processing
>>>>>>>>>>>> thread
>>>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>>>>>>>                                                  
>>>>>>>>>>>>                       
>>>>>>>>>>                                                   
>>>>>>>>>                                         
>>>>>>>>                                 
>>>>>>>                         
>>>>>>                   
>>>>>             
>>>>         
>>>     
>>
>>   
> 
> 


From jaroslav.bachorik at oracle.com  Thu Jan 10 04:31:45 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 10 Jan 2013 13:31:45 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50EEAF60.7040801@oracle.com>
References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com>
	<50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com>
	<50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com>
	<50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com>
	<50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com>
	<50EEA8F8.7090007@oracle.com> <50EEABA6.6010203@oracle.com>
	<50EEAF60.7040801@oracle.com>
Message-ID: <50EEB4B1.8070101@oracle.com>

On 01/10/2013 01:09 PM, Jaroslav Bachorik wrote:
> On 01/10/2013 12:53 PM, shanliang wrote:
>> Instead to wait GC, you can also to wait the
>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive
>> it, then your listener must be removed too. Of course this solution is
> 
> The problem is that the *NotificationForwarder implementations swallow
> this kind of notification and just perform the cleanup. No other
> listener will ever receive this notification.
> 
> The "unregisterMBean" operation's semantics is not clearly defined.
> Intuitively, when unregistering an MBean all the associated listeners
> should be gone before the method returns. But this is not the case -
> currently the listeners are sanitized some time after the
> "unregisterMBean" operation started, eventually. There is no easy way to
> notify the API user that the listeners were removed. I am afraid that in
> order to resolve these problems new APIs would need to be introduced and
> the whole mechanism of delivering notification should be revisited (as
> it was planned for JMX 2.0, anyway).
> 
> As for fixing the test - checking the weak references works fine as well
> as increasing the timeout. They both can fail when the system is
> extremely busy but the GC based solution will be in general faster than
> the one with increased timeout.

Updated webrev: http://cr.openjdk.java.net/~jbachorik/7170447/webrev.01

> 
> -JB-
> 
>> implementation dependent, but the test is already implementation dependent.
>>
>> Shanliang
>>
>>
>> Jaroslav Bachorik wrote:
>>> On 01/10/2013 10:05 AM, shanliang wrote:
>>>  
>>>> Jaroslav Bachorik wrote:
>>>>    
>>>>> On 01/09/2013 03:25 PM, shanliang wrote:
>>>>>  
>>>>>      
>>>>>> Jaroslav Bachorik wrote:
>>>>>>           
>>>>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>>>>  
>>>>>>>               
>>>>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>>>>> unregistered, a user at client side calls
>>>>>>>> "removeNotificationListener"
>>>>>>>> on the MBean, what should happen? if the user calls
>>>>>>>> "isRegistered" on
>>>>>>>> the MBean, what should happen?
>>>>>>>>
>>>>>>>> I have done 2 tests, I used only one thread:
>>>>>>>>
>>>>>>>> 1)
>>>>>>>> ......
>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>>>>
>>>>>>>> I got isRegistered = false;
>>>>>>>>
>>>>>>>> 2)
>>>>>>>> ......
>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>> System.out.println("isRegistered =
>>>>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>>>>
>>>>>>>> I did not get an exception.
>>>>>>>>
>>>>>>>> The 1) told that the client could know the MBean was unregistered,
>>>>>>>> then
>>>>>>>> the client should throw an exception for the call of
>>>>>>>> "removeNotificationListener" in 2).
>>>>>>>>                         
>>>>>>> Yes, but then it would not test the listener leakage as it was
>>>>>>> supposed
>>>>>>> to test but rather the fact that the client throws the appropriate
>>>>>>> exception. The fact that the mbean was unregistered does not
>>>>>>> necessarily
>>>>>>> mean that the listeners were released. The main problem remains - the
>>>>>>> listeners are being cleaned-up asynchronously and the clean-up
>>>>>>> process
>>>>>>> might race against the other uses of the JMX API.
>>>>>>>                   
>>>>>> client.removeNotificationListener is not a right way here to test
>>>>>> listener leak, we could use some other ways, for example we keep the
>>>>>> listener in a weak reference, then after the mbean is removed, the
>>>>>> weak
>>>>>> reference should be empty after some time. Another way is like
>>>>>> DeadListenerTest does to check whether clean has done at server side:
>>>>>> use reflection to get the "listenerMap" at server side and make
>>>>>> sure it
>>>>>> is empty, but this need to add a private method to the class
>>>>>> ClientNotifForwarder.
>>>>>>             
>>>>> There will still be problems with timing. You need either to wait for
>>>>> the GC to kick in to clean up the weak ref. And the listenerMap will
>>>>> not
>>>>> be purged of the unregistered MBean listeners until the notification is
>>>>> generated, processed on the ClientNotificationForwarder and
>>>>> forwarded to
>>>>> the server. So there goes the timing issue again.
>>>>>
>>>>> The problem is that the "unregisterMBean" operation does not guarantee
>>>>> that the listeners have been unregistered at the time it returns. So,
>>>>> one way or the other we will need to wait an arbitrary amount of time
>>>>> before checking for the memory leak.
>>>>>         
>>>> Yes we need to wait, but you can use a cycle like:
>>>>        long maxWaitingTime = 3000;
>>>>        long startTime = System.currentTimeMillis();
>>>>        while ( weakReference.get != null
>>>>                && System.currentTimeMillis() < startTime +
>>>> maxWaitingTime) {
>>>>            System.gc();
>>>>            Thread.sleep(100);
>>>>            System.gc();
>>>>        }
>>>>
>>>>       if (weakReference.get != null) {
>>>>          // failed
>>>>       }
>>>>     
>>>
>>> Still you need an arbitrary timeout which might be reached under extreme
>>> conditions making this test to fail intermittently. But I'd say that's
>>> the nature of tests for memory leak fixes, due to the unpredictable
>>> nature of the GC runs. Unless you take a heap dump and do a reachability
>>> analysis you can not be sure whether a reference is dangling somwehwere
>>> or it just hasn't been collected yet :/
>>>
>>> -JB-
>>>
>>>  
>>>> Shanliang
>>>>    
>>>>> -JB-
>>>>>
>>>>>  
>>>>>      
>>>>>> I think we have 3 things to do here:
>>>>>> 1) modify the test to not use removeNotificationListener for testing
>>>>>> listener leak
>>>>>> 2) create a new bug about a client does not throw an exception
>>>>>> after an
>>>>>> mbean is unregistered
>>>>>> 3) create a bug about a client does not throw a same exception as at
>>>>>> server side.
>>>>>>
>>>>>> I will do 2) and 3), if you like you can continue 1), it might need to
>>>>>> do fix also in the JMX implementation.
>>>>>>
>>>>>> Shanliang
>>>>>>           
>>>>>>>  
>>>>>>>               
>>>>>>>> The test "DeadListenerTest" got passed in some machines because
>>>>>>>> of the
>>>>>>>> timeout for waiting a notification. I think its failure just tells
>>>>>>>> a new
>>>>>>>> bug.
>>>>>>>>
>>>>>>>> To set a longer timeout just hides the real bug, and the test might
>>>>>>>> fail
>>>>>>>> again one day if running condition is changed and you might need
>>>>>>>> longer
>>>>>>>> timeout again.
>>>>>>>>                         
>>>>>>> Yes, I agree with you that extending the timeout just lessens the
>>>>>>> likelihood of the race condition and does not prevent it.
>>>>>>>
>>>>>>>  
>>>>>>>               
>>>>>>>> Shanliang
>>>>>>>>
>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>                      
>>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>>>>  
>>>>>>>>>                            
>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>                                     
>>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>>>>  
>>>>>>>>>>>                                             
>>>>>>>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>>>>>>>> longer
>>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>>>>> tested
>>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>>>>> problem.
>>>>>>>>>>>>                                                             
>>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>>>>> -Xcomp
>>>>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>>>>
>>>>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and
>>>>>>>>>>> removes the
>>>>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>>>>> calling
>>>>>>>>>>> "removeNotificationListener" on the main thread is prone to
>>>>>>>>>>> racing.
>>>>>>>>>>>                                                   
>>>>>>>>>> It is true that ClientNotifForwarder scans the notifications for
>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes
>>>>>>>>>> the
>>>>>>>>>> appropriate notification listeners in a separate thread. This is
>>>>>>>>>> for a
>>>>>>>>>> client connection to do clean if a user never calls
>>>>>>>>>> removeNotificationListener.
>>>>>>>>>>
>>>>>>>>>> But calling directly removeNotificationListener from a client
>>>>>>>>>> should
>>>>>>>>>> still get exception if the clean has not been done. As I said, if
>>>>>>>>>> the
>>>>>>>>>> client checked and found the listener was still there, then the
>>>>>>>>>> client
>>>>>>>>>> sent a request to its server to remove the listener at server
>>>>>>>>>> side,
>>>>>>>>>> the
>>>>>>>>>> server should find that the MBean in question was not registered,
>>>>>>>>>> so the
>>>>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>>>>                                         
>>>>>>>>> This won't work. The server side listeners are removed upon
>>>>>>>>> receiving
>>>>>>>>> the "unregistered" notification which is delivered from the
>>>>>>>>> ClientNotificationForwarder and it may have not run yet (since it
>>>>>>>>> runs
>>>>>>>>> in a separate executor thread). The result is that the attempt to
>>>>>>>>> remove
>>>>>>>>> the notification listener on the server will succeed as well
>>>>>>>>> failing
>>>>>>>>> the
>>>>>>>>> test subsequently.
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>>  
>>>>>>>>>                            
>>>>>>>>>> Shanliang
>>>>>>>>>>                                     
>>>>>>>>>>>  
>>>>>>>>>>>                                             
>>>>>>>>>>>> The timeout you made longer was used to wait a notification
>>>>>>>>>>>> which
>>>>>>>>>>>> should
>>>>>>>>>>>> never arrive.
>>>>>>>>>>>>                                                             
>>>>>>>>>>> Well, it can be used to allow more time to process the
>>>>>>>>>>> "unregister"
>>>>>>>>>>> notification, too.
>>>>>>>>>>>
>>>>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>>>>> condition.
>>>>>>>>>>> An updated webrev will follow.
>>>>>>>>>>>
>>>>>>>>>>>  
>>>>>>>>>>>                                             
>>>>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>>>>> 1) at client side, check whether it was added in the client side
>>>>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>>>>> registered in
>>>>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>>>>
>>>>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>>>>>>>> Anyway,
>>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>>>>> unregister an
>>>>>>>>>>>> mbean, then any following call (without any time break) to
>>>>>>>>>>>> use the
>>>>>>>>>>>> mbean
>>>>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered"
>>>>>>>>>>>> etc.
>>>>>>>>>>>>
>>>>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>>>>>>>> MBeam,
>>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>>>>> create a
>>>>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>>>>> MBeanServerConnection.
>>>>>>>>>>>>                                                             
>>>>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>
>>>>>>>>>>>  
>>>>>>>>>>>                                             
>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>
>>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>>                                                        
>>>>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Webrev at
>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>>>>
>>>>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>>>>> notification
>>>>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>>>>> hasn't
>>>>>>>>>>>>> been
>>>>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is no synchronization between the client and the
>>>>>>>>>>>>> server and
>>>>>>>>>>>>> such
>>>>>>>>>>>>> race condition can occur occasionally. Normally, the
>>>>>>>>>>>>> execution is
>>>>>>>>>>>>> fast
>>>>>>>>>>>>> enough to behave like the "unregister" notification is
>>>>>>>>>>>>> processed
>>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems
>>>>>>>>>>>>> that
>>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains
>>>>>>>>>>>>> the CPU
>>>>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>>>>>>>> waiting a
>>>>>>>>>>>>> bit longer in the main thread to give the notification
>>>>>>>>>>>>> processing
>>>>>>>>>>>>> thread
>>>>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>                                                  
>>>>>>>>>>>>>                       
>>>>>>>>>>>                                                   
>>>>>>>>>>                                         
>>>>>>>>>                                 
>>>>>>>>                         
>>>>>>>                   
>>>>>>             
>>>>>         
>>>>     
>>>
>>>   
>>
>>
> 


From shanliang.jiang at oracle.com  Thu Jan 10 05:18:32 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Thu, 10 Jan 2013 14:18:32 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50EEB4B1.8070101@oracle.com>
References: <50EC3850.7080508@oracle.com>
	<50ED2D0A.5000509@oracle.com>	<50ED3C4F.1070001@oracle.com>
	<50ED41AC.4010007@oracle.com>	<50ED6D8E.6070404@oracle.com>
	<50ED7436.1020205@oracle.com>	<50ED7801.8080704@oracle.com>
	<50ED7DEF.9020108@oracle.com>	<50EE824A.8020106@oracle.com>
	<50EE8447.50901@oracle.com>	<50EEA8F8.7090007@oracle.com>
	<50EEABA6.6010203@oracle.com>	<50EEAF60.7040801@oracle.com>
	<50EEB4B1.8070101@oracle.com>
Message-ID: <50EEBFA8.5010001@oracle.com>

The weakListener is unnecessary, the test does already the same 
verification:
 171         Set<?> setForUnreg = listenerMap.get(name);
 172         assertTrue("No trace of unregistered MBean: " + 
setForUnreg, setForUnreg == null);

All other are OK for me.

Shanliang


Jaroslav Bachorik wrote:
> On 01/10/2013 01:09 PM, Jaroslav Bachorik wrote:
>   
>> On 01/10/2013 12:53 PM, shanliang wrote:
>>     
>>> Instead to wait GC, you can also to wait the
>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive
>>> it, then your listener must be removed too. Of course this solution is
>>>       
>> The problem is that the *NotificationForwarder implementations swallow
>> this kind of notification and just perform the cleanup. No other
>> listener will ever receive this notification.
>>
>> The "unregisterMBean" operation's semantics is not clearly defined.
>> Intuitively, when unregistering an MBean all the associated listeners
>> should be gone before the method returns. But this is not the case -
>> currently the listeners are sanitized some time after the
>> "unregisterMBean" operation started, eventually. There is no easy way to
>> notify the API user that the listeners were removed. I am afraid that in
>> order to resolve these problems new APIs would need to be introduced and
>> the whole mechanism of delivering notification should be revisited (as
>> it was planned for JMX 2.0, anyway).
>>
>> As for fixing the test - checking the weak references works fine as well
>> as increasing the timeout. They both can fail when the system is
>> extremely busy but the GC based solution will be in general faster than
>> the one with increased timeout.
>>     
>
> Updated webrev: http://cr.openjdk.java.net/~jbachorik/7170447/webrev.01
>
>   
>> -JB-
>>
>>     
>>> implementation dependent, but the test is already implementation dependent.
>>>
>>> Shanliang
>>>
>>>
>>> Jaroslav Bachorik wrote:
>>>       
>>>> On 01/10/2013 10:05 AM, shanliang wrote:
>>>>  
>>>>         
>>>>> Jaroslav Bachorik wrote:
>>>>>    
>>>>>           
>>>>>> On 01/09/2013 03:25 PM, shanliang wrote:
>>>>>>  
>>>>>>      
>>>>>>             
>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>           
>>>>>>>               
>>>>>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>>>>>  
>>>>>>>>               
>>>>>>>>                 
>>>>>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>>>>>> unregistered, a user at client side calls
>>>>>>>>> "removeNotificationListener"
>>>>>>>>> on the MBean, what should happen? if the user calls
>>>>>>>>> "isRegistered" on
>>>>>>>>> the MBean, what should happen?
>>>>>>>>>
>>>>>>>>> I have done 2 tests, I used only one thread:
>>>>>>>>>
>>>>>>>>> 1)
>>>>>>>>> ......
>>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>>>>>
>>>>>>>>> I got isRegistered = false;
>>>>>>>>>
>>>>>>>>> 2)
>>>>>>>>> ......
>>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>>> System.out.println("isRegistered =
>>>>>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>>>>>
>>>>>>>>> I did not get an exception.
>>>>>>>>>
>>>>>>>>> The 1) told that the client could know the MBean was unregistered,
>>>>>>>>> then
>>>>>>>>> the client should throw an exception for the call of
>>>>>>>>> "removeNotificationListener" in 2).
>>>>>>>>>                         
>>>>>>>>>                   
>>>>>>>> Yes, but then it would not test the listener leakage as it was
>>>>>>>> supposed
>>>>>>>> to test but rather the fact that the client throws the appropriate
>>>>>>>> exception. The fact that the mbean was unregistered does not
>>>>>>>> necessarily
>>>>>>>> mean that the listeners were released. The main problem remains - the
>>>>>>>> listeners are being cleaned-up asynchronously and the clean-up
>>>>>>>> process
>>>>>>>> might race against the other uses of the JMX API.
>>>>>>>>                   
>>>>>>>>                 
>>>>>>> client.removeNotificationListener is not a right way here to test
>>>>>>> listener leak, we could use some other ways, for example we keep the
>>>>>>> listener in a weak reference, then after the mbean is removed, the
>>>>>>> weak
>>>>>>> reference should be empty after some time. Another way is like
>>>>>>> DeadListenerTest does to check whether clean has done at server side:
>>>>>>> use reflection to get the "listenerMap" at server side and make
>>>>>>> sure it
>>>>>>> is empty, but this need to add a private method to the class
>>>>>>> ClientNotifForwarder.
>>>>>>>             
>>>>>>>               
>>>>>> There will still be problems with timing. You need either to wait for
>>>>>> the GC to kick in to clean up the weak ref. And the listenerMap will
>>>>>> not
>>>>>> be purged of the unregistered MBean listeners until the notification is
>>>>>> generated, processed on the ClientNotificationForwarder and
>>>>>> forwarded to
>>>>>> the server. So there goes the timing issue again.
>>>>>>
>>>>>> The problem is that the "unregisterMBean" operation does not guarantee
>>>>>> that the listeners have been unregistered at the time it returns. So,
>>>>>> one way or the other we will need to wait an arbitrary amount of time
>>>>>> before checking for the memory leak.
>>>>>>         
>>>>>>             
>>>>> Yes we need to wait, but you can use a cycle like:
>>>>>        long maxWaitingTime = 3000;
>>>>>        long startTime = System.currentTimeMillis();
>>>>>        while ( weakReference.get != null
>>>>>                && System.currentTimeMillis() < startTime +
>>>>> maxWaitingTime) {
>>>>>            System.gc();
>>>>>            Thread.sleep(100);
>>>>>            System.gc();
>>>>>        }
>>>>>
>>>>>       if (weakReference.get != null) {
>>>>>          // failed
>>>>>       }
>>>>>     
>>>>>           
>>>> Still you need an arbitrary timeout which might be reached under extreme
>>>> conditions making this test to fail intermittently. But I'd say that's
>>>> the nature of tests for memory leak fixes, due to the unpredictable
>>>> nature of the GC runs. Unless you take a heap dump and do a reachability
>>>> analysis you can not be sure whether a reference is dangling somwehwere
>>>> or it just hasn't been collected yet :/
>>>>
>>>> -JB-
>>>>
>>>>  
>>>>         
>>>>> Shanliang
>>>>>    
>>>>>           
>>>>>> -JB-
>>>>>>
>>>>>>  
>>>>>>      
>>>>>>             
>>>>>>> I think we have 3 things to do here:
>>>>>>> 1) modify the test to not use removeNotificationListener for testing
>>>>>>> listener leak
>>>>>>> 2) create a new bug about a client does not throw an exception
>>>>>>> after an
>>>>>>> mbean is unregistered
>>>>>>> 3) create a bug about a client does not throw a same exception as at
>>>>>>> server side.
>>>>>>>
>>>>>>> I will do 2) and 3), if you like you can continue 1), it might need to
>>>>>>> do fix also in the JMX implementation.
>>>>>>>
>>>>>>> Shanliang
>>>>>>>           
>>>>>>>               
>>>>>>>>  
>>>>>>>>               
>>>>>>>>                 
>>>>>>>>> The test "DeadListenerTest" got passed in some machines because
>>>>>>>>> of the
>>>>>>>>> timeout for waiting a notification. I think its failure just tells
>>>>>>>>> a new
>>>>>>>>> bug.
>>>>>>>>>
>>>>>>>>> To set a longer timeout just hides the real bug, and the test might
>>>>>>>>> fail
>>>>>>>>> again one day if running condition is changed and you might need
>>>>>>>>> longer
>>>>>>>>> timeout again.
>>>>>>>>>                         
>>>>>>>>>                   
>>>>>>>> Yes, I agree with you that extending the timeout just lessens the
>>>>>>>> likelihood of the race condition and does not prevent it.
>>>>>>>>
>>>>>>>>  
>>>>>>>>               
>>>>>>>>                 
>>>>>>>>> Shanliang
>>>>>>>>>
>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>                      
>>>>>>>>>                   
>>>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>>>>>  
>>>>>>>>>>                            
>>>>>>>>>>                     
>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>                                     
>>>>>>>>>>>                       
>>>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>>>>>  
>>>>>>>>>>>>                                             
>>>>>>>>>>>>                         
>>>>>>>>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>>>>>>>>> longer
>>>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>>>>>> tested
>>>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>                                                             
>>>>>>>>>>>>>                           
>>>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>>>>>> -Xcomp
>>>>>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>>>>>
>>>>>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and
>>>>>>>>>>>> removes the
>>>>>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>>>>>> calling
>>>>>>>>>>>> "removeNotificationListener" on the main thread is prone to
>>>>>>>>>>>> racing.
>>>>>>>>>>>>                                                   
>>>>>>>>>>>>                         
>>>>>>>>>>> It is true that ClientNotifForwarder scans the notifications for
>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes
>>>>>>>>>>> the
>>>>>>>>>>> appropriate notification listeners in a separate thread. This is
>>>>>>>>>>> for a
>>>>>>>>>>> client connection to do clean if a user never calls
>>>>>>>>>>> removeNotificationListener.
>>>>>>>>>>>
>>>>>>>>>>> But calling directly removeNotificationListener from a client
>>>>>>>>>>> should
>>>>>>>>>>> still get exception if the clean has not been done. As I said, if
>>>>>>>>>>> the
>>>>>>>>>>> client checked and found the listener was still there, then the
>>>>>>>>>>> client
>>>>>>>>>>> sent a request to its server to remove the listener at server
>>>>>>>>>>> side,
>>>>>>>>>>> the
>>>>>>>>>>> server should find that the MBean in question was not registered,
>>>>>>>>>>> so the
>>>>>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>>>>>                                         
>>>>>>>>>>>                       
>>>>>>>>>> This won't work. The server side listeners are removed upon
>>>>>>>>>> receiving
>>>>>>>>>> the "unregistered" notification which is delivered from the
>>>>>>>>>> ClientNotificationForwarder and it may have not run yet (since it
>>>>>>>>>> runs
>>>>>>>>>> in a separate executor thread). The result is that the attempt to
>>>>>>>>>> remove
>>>>>>>>>> the notification listener on the server will succeed as well
>>>>>>>>>> failing
>>>>>>>>>> the
>>>>>>>>>> test subsequently.
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>>>>>>  
>>>>>>>>>>                            
>>>>>>>>>>                     
>>>>>>>>>>> Shanliang
>>>>>>>>>>>                                     
>>>>>>>>>>>                       
>>>>>>>>>>>>  
>>>>>>>>>>>>                                             
>>>>>>>>>>>>                         
>>>>>>>>>>>>> The timeout you made longer was used to wait a notification
>>>>>>>>>>>>> which
>>>>>>>>>>>>> should
>>>>>>>>>>>>> never arrive.
>>>>>>>>>>>>>                                                             
>>>>>>>>>>>>>                           
>>>>>>>>>>>> Well, it can be used to allow more time to process the
>>>>>>>>>>>> "unregister"
>>>>>>>>>>>> notification, too.
>>>>>>>>>>>>
>>>>>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>>>>>> condition.
>>>>>>>>>>>> An updated webrev will follow.
>>>>>>>>>>>>
>>>>>>>>>>>>  
>>>>>>>>>>>>                                             
>>>>>>>>>>>>                         
>>>>>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>>>>>> 1) at client side, check whether it was added in the client side
>>>>>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>>>>>> registered in
>>>>>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>>>>>>>>> Anyway,
>>>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>>>>>> unregister an
>>>>>>>>>>>>> mbean, then any following call (without any time break) to
>>>>>>>>>>>>> use the
>>>>>>>>>>>>> mbean
>>>>>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered"
>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>>>>>>>>> MBeam,
>>>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>>>>>> create a
>>>>>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>>>>>> MBeanServerConnection.
>>>>>>>>>>>>>                                                             
>>>>>>>>>>>>>                           
>>>>>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>>>>>>>
>>>>>>>>>>>>  
>>>>>>>>>>>>                                             
>>>>>>>>>>>>                         
>>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>>>                                                        
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Webrev at
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>>>>>> notification
>>>>>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>>>>>> hasn't
>>>>>>>>>>>>>> been
>>>>>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is no synchronization between the client and the
>>>>>>>>>>>>>> server and
>>>>>>>>>>>>>> such
>>>>>>>>>>>>>> race condition can occur occasionally. Normally, the
>>>>>>>>>>>>>> execution is
>>>>>>>>>>>>>> fast
>>>>>>>>>>>>>> enough to behave like the "unregister" notification is
>>>>>>>>>>>>>> processed
>>>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains
>>>>>>>>>>>>>> the CPU
>>>>>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>>>>>>>>> waiting a
>>>>>>>>>>>>>> bit longer in the main thread to give the notification
>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>                                                  
>>>>>>>>>>>>>>                       
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>                                                   
>>>>>>>>>>>>                         
>>>>>>>>>>>                                         
>>>>>>>>>>>                       
>>>>>>>>>>                                 
>>>>>>>>>>                     
>>>>>>>>>                         
>>>>>>>>>                   
>>>>>>>>                   
>>>>>>>>                 
>>>>>>>             
>>>>>>>               
>>>>>>         
>>>>>>             
>>>>>     
>>>>>           
>>>>   
>>>>         
>>>       
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130110/c9e6fdf9/attachment-0001.html 

From jaroslav.bachorik at oracle.com  Thu Jan 10 05:49:09 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 10 Jan 2013 14:49:09 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50EEBFA8.5010001@oracle.com>
References: <50EC3850.7080508@oracle.com>
	<50ED2D0A.5000509@oracle.com>	<50ED3C4F.1070001@oracle.com>
	<50ED41AC.4010007@oracle.com>	<50ED6D8E.6070404@oracle.com>
	<50ED7436.1020205@oracle.com>	<50ED7801.8080704@oracle.com>
	<50ED7DEF.9020108@oracle.com>	<50EE824A.8020106@oracle.com>
	<50EE8447.50901@oracle.com>	<50EEA8F8.7090007@oracle.com>
	<50EEABA6.6010203@oracle.com>	<50EEAF60.7040801@oracle.com>
	<50EEB4B1.8070101@oracle.com> <50EEBFA8.5010001@oracle.com>
Message-ID: <50EEC6D5.2040400@oracle.com>

On 01/10/2013 02:18 PM, shanliang wrote:
> The weakListener is unnecessary, the test does already the same
> verification:
> 171         Set<?> setForUnreg = listenerMap.get(name);
> 172         assertTrue("No trace of unregistered MBean: " + setForUnreg,
> setForUnreg == null);

Addressed.

> 
> All other are OK for me.

So, http://cr.openjdk.java.net/~jbachorik/7170447/webrev.02 could be the
final version.

Thanks for the review!

-JB-

> 
> Shanliang
> 
> 
> Jaroslav Bachorik wrote:
>> On 01/10/2013 01:09 PM, Jaroslav Bachorik wrote:
>>  
>>> On 01/10/2013 12:53 PM, shanliang wrote:
>>>    
>>>> Instead to wait GC, you can also to wait the
>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive
>>>> it, then your listener must be removed too. Of course this solution is
>>>>       
>>> The problem is that the *NotificationForwarder implementations swallow
>>> this kind of notification and just perform the cleanup. No other
>>> listener will ever receive this notification.
>>>
>>> The "unregisterMBean" operation's semantics is not clearly defined.
>>> Intuitively, when unregistering an MBean all the associated listeners
>>> should be gone before the method returns. But this is not the case -
>>> currently the listeners are sanitized some time after the
>>> "unregisterMBean" operation started, eventually. There is no easy way to
>>> notify the API user that the listeners were removed. I am afraid that in
>>> order to resolve these problems new APIs would need to be introduced and
>>> the whole mechanism of delivering notification should be revisited (as
>>> it was planned for JMX 2.0, anyway).
>>>
>>> As for fixing the test - checking the weak references works fine as well
>>> as increasing the timeout. They both can fail when the system is
>>> extremely busy but the GC based solution will be in general faster than
>>> the one with increased timeout.
>>>     
>>
>> Updated webrev: http://cr.openjdk.java.net/~jbachorik/7170447/webrev.01
>>
>>  
>>> -JB-
>>>
>>>    
>>>> implementation dependent, but the test is already implementation
>>>> dependent.
>>>>
>>>> Shanliang
>>>>
>>>>
>>>> Jaroslav Bachorik wrote:
>>>>      
>>>>> On 01/10/2013 10:05 AM, shanliang wrote:
>>>>>  
>>>>>        
>>>>>> Jaroslav Bachorik wrote:
>>>>>>             
>>>>>>> On 01/09/2013 03:25 PM, shanliang wrote:
>>>>>>>  
>>>>>>>                 
>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>                        
>>>>>>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>>>>>>  
>>>>>>>>>                              
>>>>>>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>>>>>>> unregistered, a user at client side calls
>>>>>>>>>> "removeNotificationListener"
>>>>>>>>>> on the MBean, what should happen? if the user calls
>>>>>>>>>> "isRegistered" on
>>>>>>>>>> the MBean, what should happen?
>>>>>>>>>>
>>>>>>>>>> I have done 2 tests, I used only one thread:
>>>>>>>>>>
>>>>>>>>>> 1)
>>>>>>>>>> ......
>>>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>>>>>>
>>>>>>>>>> I got isRegistered = false;
>>>>>>>>>>
>>>>>>>>>> 2)
>>>>>>>>>> ......
>>>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>>>> System.out.println("isRegistered =
>>>>>>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>>>>>>
>>>>>>>>>> I did not get an exception.
>>>>>>>>>>
>>>>>>>>>> The 1) told that the client could know the MBean was
>>>>>>>>>> unregistered,
>>>>>>>>>> then
>>>>>>>>>> the client should throw an exception for the call of
>>>>>>>>>> "removeNotificationListener" in 2).
>>>>>>>>>>                                           
>>>>>>>>> Yes, but then it would not test the listener leakage as it was
>>>>>>>>> supposed
>>>>>>>>> to test but rather the fact that the client throws the appropriate
>>>>>>>>> exception. The fact that the mbean was unregistered does not
>>>>>>>>> necessarily
>>>>>>>>> mean that the listeners were released. The main problem remains
>>>>>>>>> - the
>>>>>>>>> listeners are being cleaned-up asynchronously and the clean-up
>>>>>>>>> process
>>>>>>>>> might race against the other uses of the JMX API.
>>>>>>>>>                                   
>>>>>>>> client.removeNotificationListener is not a right way here to test
>>>>>>>> listener leak, we could use some other ways, for example we keep
>>>>>>>> the
>>>>>>>> listener in a weak reference, then after the mbean is removed, the
>>>>>>>> weak
>>>>>>>> reference should be empty after some time. Another way is like
>>>>>>>> DeadListenerTest does to check whether clean has done at server
>>>>>>>> side:
>>>>>>>> use reflection to get the "listenerMap" at server side and make
>>>>>>>> sure it
>>>>>>>> is empty, but this need to add a private method to the class
>>>>>>>> ClientNotifForwarder.
>>>>>>>>                           
>>>>>>> There will still be problems with timing. You need either to wait
>>>>>>> for
>>>>>>> the GC to kick in to clean up the weak ref. And the listenerMap will
>>>>>>> not
>>>>>>> be purged of the unregistered MBean listeners until the
>>>>>>> notification is
>>>>>>> generated, processed on the ClientNotificationForwarder and
>>>>>>> forwarded to
>>>>>>> the server. So there goes the timing issue again.
>>>>>>>
>>>>>>> The problem is that the "unregisterMBean" operation does not
>>>>>>> guarantee
>>>>>>> that the listeners have been unregistered at the time it returns.
>>>>>>> So,
>>>>>>> one way or the other we will need to wait an arbitrary amount of
>>>>>>> time
>>>>>>> before checking for the memory leak.
>>>>>>>                     
>>>>>> Yes we need to wait, but you can use a cycle like:
>>>>>>        long maxWaitingTime = 3000;
>>>>>>        long startTime = System.currentTimeMillis();
>>>>>>        while ( weakReference.get != null
>>>>>>                && System.currentTimeMillis() < startTime +
>>>>>> maxWaitingTime) {
>>>>>>            System.gc();
>>>>>>            Thread.sleep(100);
>>>>>>            System.gc();
>>>>>>        }
>>>>>>
>>>>>>       if (weakReference.get != null) {
>>>>>>          // failed
>>>>>>       }
>>>>>>               
>>>>> Still you need an arbitrary timeout which might be reached under
>>>>> extreme
>>>>> conditions making this test to fail intermittently. But I'd say that's
>>>>> the nature of tests for memory leak fixes, due to the unpredictable
>>>>> nature of the GC runs. Unless you take a heap dump and do a
>>>>> reachability
>>>>> analysis you can not be sure whether a reference is dangling
>>>>> somwehwere
>>>>> or it just hasn't been collected yet :/
>>>>>
>>>>> -JB-
>>>>>
>>>>>  
>>>>>        
>>>>>> Shanliang
>>>>>>             
>>>>>>> -JB-
>>>>>>>
>>>>>>>  
>>>>>>>                 
>>>>>>>> I think we have 3 things to do here:
>>>>>>>> 1) modify the test to not use removeNotificationListener for
>>>>>>>> testing
>>>>>>>> listener leak
>>>>>>>> 2) create a new bug about a client does not throw an exception
>>>>>>>> after an
>>>>>>>> mbean is unregistered
>>>>>>>> 3) create a bug about a client does not throw a same exception
>>>>>>>> as at
>>>>>>>> server side.
>>>>>>>>
>>>>>>>> I will do 2) and 3), if you like you can continue 1), it might
>>>>>>>> need to
>>>>>>>> do fix also in the JMX implementation.
>>>>>>>>
>>>>>>>> Shanliang
>>>>>>>>                        
>>>>>>>>>  
>>>>>>>>>                              
>>>>>>>>>> The test "DeadListenerTest" got passed in some machines because
>>>>>>>>>> of the
>>>>>>>>>> timeout for waiting a notification. I think its failure just
>>>>>>>>>> tells
>>>>>>>>>> a new
>>>>>>>>>> bug.
>>>>>>>>>>
>>>>>>>>>> To set a longer timeout just hides the real bug, and the test
>>>>>>>>>> might
>>>>>>>>>> fail
>>>>>>>>>> again one day if running condition is changed and you might need
>>>>>>>>>> longer
>>>>>>>>>> timeout again.
>>>>>>>>>>                                           
>>>>>>>>> Yes, I agree with you that extending the timeout just lessens the
>>>>>>>>> likelihood of the race condition and does not prevent it.
>>>>>>>>>
>>>>>>>>>  
>>>>>>>>>                              
>>>>>>>>>> Shanliang
>>>>>>>>>>
>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>                                       
>>>>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>>>>>>  
>>>>>>>>>>>                                               
>>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>>                                                          
>>>>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>>>>>>  
>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>                        
>>>>>>>>>>>>>> I still have no idea why the test failed, but I do not see
>>>>>>>>>>>>>> why a
>>>>>>>>>>>>>> longer
>>>>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>>>>>>> tested
>>>>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>                                                                                      
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>>>>>>> -Xcomp
>>>>>>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and
>>>>>>>>>>>>> removes the
>>>>>>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>>>>>>> calling
>>>>>>>>>>>>> "removeNotificationListener" on the main thread is prone to
>>>>>>>>>>>>> racing.
>>>>>>>>>>>>>                                                  
>>>>>>>>>>>>>                         
>>>>>>>>>>>> It is true that ClientNotifForwarder scans the notifications
>>>>>>>>>>>> for
>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes
>>>>>>>>>>>> the
>>>>>>>>>>>> appropriate notification listeners in a separate thread.
>>>>>>>>>>>> This is
>>>>>>>>>>>> for a
>>>>>>>>>>>> client connection to do clean if a user never calls
>>>>>>>>>>>> removeNotificationListener.
>>>>>>>>>>>>
>>>>>>>>>>>> But calling directly removeNotificationListener from a client
>>>>>>>>>>>> should
>>>>>>>>>>>> still get exception if the clean has not been done. As I
>>>>>>>>>>>> said, if
>>>>>>>>>>>> the
>>>>>>>>>>>> client checked and found the listener was still there, then the
>>>>>>>>>>>> client
>>>>>>>>>>>> sent a request to its server to remove the listener at server
>>>>>>>>>>>> side,
>>>>>>>>>>>> the
>>>>>>>>>>>> server should find that the MBean in question was not
>>>>>>>>>>>> registered,
>>>>>>>>>>>> so the
>>>>>>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>>>>>>                                                               
>>>>>>>>>>> This won't work. The server side listeners are removed upon
>>>>>>>>>>> receiving
>>>>>>>>>>> the "unregistered" notification which is delivered from the
>>>>>>>>>>> ClientNotificationForwarder and it may have not run yet
>>>>>>>>>>> (since it
>>>>>>>>>>> runs
>>>>>>>>>>> in a separate executor thread). The result is that the
>>>>>>>>>>> attempt to
>>>>>>>>>>> remove
>>>>>>>>>>> the notification listener on the server will succeed as well
>>>>>>>>>>> failing
>>>>>>>>>>> the
>>>>>>>>>>> test subsequently.
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>
>>>>>>>>>>>  
>>>>>>>>>>>                                               
>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>                                                          
>>>>>>>>>>>>>  
>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>                        
>>>>>>>>>>>>>> The timeout you made longer was used to wait a notification
>>>>>>>>>>>>>> which
>>>>>>>>>>>>>> should
>>>>>>>>>>>>>> never arrive.
>>>>>>>>>>>>>>                                                                                      
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, it can be used to allow more time to process the
>>>>>>>>>>>>> "unregister"
>>>>>>>>>>>>> notification, too.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>>>>>>> condition.
>>>>>>>>>>>>> An updated webrev will follow.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  
>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>                        
>>>>>>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>>>>>>> 1) at client side, check whether it was added in the
>>>>>>>>>>>>>> client side
>>>>>>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>>>>>>> registered in
>>>>>>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So 2) tells that we did not rely on a "unregister"
>>>>>>>>>>>>>> notification.
>>>>>>>>>>>>>> Anyway,
>>>>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>>>>>>> unregister an
>>>>>>>>>>>>>> mbean, then any following call (without any time break) to
>>>>>>>>>>>>>> use the
>>>>>>>>>>>>>> mbean
>>>>>>>>>>>>>> should fail, like "removeNotificationListener",
>>>>>>>>>>>>>> "isRegistered"
>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I do see a bug here, if we remove a listener from a
>>>>>>>>>>>>>> non-existing
>>>>>>>>>>>>>> MBeam,
>>>>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>>>>>>> create a
>>>>>>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>>>>>>> MBeanServerConnection.
>>>>>>>>>>>>>>                                                                                      
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>
>>>>>>>>>>>>>  
>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>                        
>>>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>                                                       
>>>>>>>>>>>>>>                          
>>>>>>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Webrev at
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>>>>>>> fires the "unregister" notification which is sent to the
>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>>>>>>> notification
>>>>>>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>>>>>>> hasn't
>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There is no synchronization between the client and the
>>>>>>>>>>>>>>> server and
>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>> race condition can occur occasionally. Normally, the
>>>>>>>>>>>>>>> execution is
>>>>>>>>>>>>>>> fast
>>>>>>>>>>>>>>> enough to behave like the "unregister" notification is
>>>>>>>>>>>>>>> processed
>>>>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it
>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option
>>>>>>>>>>>>>>> strains
>>>>>>>>>>>>>>> the CPU
>>>>>>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There is no proper fix for this - the only thing that
>>>>>>>>>>>>>>> work is
>>>>>>>>>>>>>>> waiting a
>>>>>>>>>>>>>>> bit longer in the main thread to give the notification
>>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>>                                                 
>>>>>>>>>>>>>>>                                                   
>>>>>>>>>>>>>                                                  
>>>>>>>>>>>>>                         
>>>>>>>>>>>>                                                               
>>>>>>>>>>>                                                     
>>>>>>>>>>                                           
>>>>>>>>>                                   
>>>>>>>>                           
>>>>>>>                     
>>>>>>               
>>>>>           
>>>>       
>>
>>   
> 
> 


From shanliang.jiang at oracle.com  Thu Jan 10 07:14:40 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Thu, 10 Jan 2013 16:14:40 +0100
Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java
 failure
In-Reply-To: <50EEC6D5.2040400@oracle.com>
References: <50EC3850.7080508@oracle.com>
	<50ED2D0A.5000509@oracle.com>	<50ED3C4F.1070001@oracle.com>
	<50ED41AC.4010007@oracle.com>	<50ED6D8E.6070404@oracle.com>
	<50ED7436.1020205@oracle.com>	<50ED7801.8080704@oracle.com>
	<50ED7DEF.9020108@oracle.com>	<50EE824A.8020106@oracle.com>
	<50EE8447.50901@oracle.com>	<50EEA8F8.7090007@oracle.com>
	<50EEABA6.6010203@oracle.com>	<50EEAF60.7040801@oracle.com>
	<50EEB4B1.8070101@oracle.com> <50EEBFA8.5010001@oracle.com>
	<50EEC6D5.2040400@oracle.com>
Message-ID: <50EEDAE0.7040904@oracle.com>

It is OK for me, thanks for fixing the bug!

Shanliang

Jaroslav Bachorik wrote:
> On 01/10/2013 02:18 PM, shanliang wrote:
>   
>> The weakListener is unnecessary, the test does already the same
>> verification:
>> 171         Set<?> setForUnreg = listenerMap.get(name);
>> 172         assertTrue("No trace of unregistered MBean: " + setForUnreg,
>> setForUnreg == null);
>>     
>
> Addressed.
>
>   
>> All other are OK for me.
>>     
>
> So, http://cr.openjdk.java.net/~jbachorik/7170447/webrev.02 could be the
> final version.
>
> Thanks for the review!
>
> -JB-
>
>   
>> Shanliang
>>
>>
>> Jaroslav Bachorik wrote:
>>     
>>> On 01/10/2013 01:09 PM, Jaroslav Bachorik wrote:
>>>  
>>>       
>>>> On 01/10/2013 12:53 PM, shanliang wrote:
>>>>    
>>>>         
>>>>> Instead to wait GC, you can also to wait the
>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive
>>>>> it, then your listener must be removed too. Of course this solution is
>>>>>       
>>>>>           
>>>> The problem is that the *NotificationForwarder implementations swallow
>>>> this kind of notification and just perform the cleanup. No other
>>>> listener will ever receive this notification.
>>>>
>>>> The "unregisterMBean" operation's semantics is not clearly defined.
>>>> Intuitively, when unregistering an MBean all the associated listeners
>>>> should be gone before the method returns. But this is not the case -
>>>> currently the listeners are sanitized some time after the
>>>> "unregisterMBean" operation started, eventually. There is no easy way to
>>>> notify the API user that the listeners were removed. I am afraid that in
>>>> order to resolve these problems new APIs would need to be introduced and
>>>> the whole mechanism of delivering notification should be revisited (as
>>>> it was planned for JMX 2.0, anyway).
>>>>
>>>> As for fixing the test - checking the weak references works fine as well
>>>> as increasing the timeout. They both can fail when the system is
>>>> extremely busy but the GC based solution will be in general faster than
>>>> the one with increased timeout.
>>>>     
>>>>         
>>> Updated webrev: http://cr.openjdk.java.net/~jbachorik/7170447/webrev.01
>>>
>>>  
>>>       
>>>> -JB-
>>>>
>>>>    
>>>>         
>>>>> implementation dependent, but the test is already implementation
>>>>> dependent.
>>>>>
>>>>> Shanliang
>>>>>
>>>>>
>>>>> Jaroslav Bachorik wrote:
>>>>>      
>>>>>           
>>>>>> On 01/10/2013 10:05 AM, shanliang wrote:
>>>>>>  
>>>>>>        
>>>>>>             
>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>             
>>>>>>>               
>>>>>>>> On 01/09/2013 03:25 PM, shanliang wrote:
>>>>>>>>  
>>>>>>>>                 
>>>>>>>>                 
>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>                        
>>>>>>>>>                   
>>>>>>>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>>>>>>>  
>>>>>>>>>>                              
>>>>>>>>>>                     
>>>>>>>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>>>>>>>> unregistered, a user at client side calls
>>>>>>>>>>> "removeNotificationListener"
>>>>>>>>>>> on the MBean, what should happen? if the user calls
>>>>>>>>>>> "isRegistered" on
>>>>>>>>>>> the MBean, what should happen?
>>>>>>>>>>>
>>>>>>>>>>> I have done 2 tests, I used only one thread:
>>>>>>>>>>>
>>>>>>>>>>> 1)
>>>>>>>>>>> ......
>>>>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>>>>>>>
>>>>>>>>>>> I got isRegistered = false;
>>>>>>>>>>>
>>>>>>>>>>> 2)
>>>>>>>>>>> ......
>>>>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>>>>> System.out.println("isRegistered =
>>>>>>>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>>>>>>>
>>>>>>>>>>> I did not get an exception.
>>>>>>>>>>>
>>>>>>>>>>> The 1) told that the client could know the MBean was
>>>>>>>>>>> unregistered,
>>>>>>>>>>> then
>>>>>>>>>>> the client should throw an exception for the call of
>>>>>>>>>>> "removeNotificationListener" in 2).
>>>>>>>>>>>                                           
>>>>>>>>>>>                       
>>>>>>>>>> Yes, but then it would not test the listener leakage as it was
>>>>>>>>>> supposed
>>>>>>>>>> to test but rather the fact that the client throws the appropriate
>>>>>>>>>> exception. The fact that the mbean was unregistered does not
>>>>>>>>>> necessarily
>>>>>>>>>> mean that the listeners were released. The main problem remains
>>>>>>>>>> - the
>>>>>>>>>> listeners are being cleaned-up asynchronously and the clean-up
>>>>>>>>>> process
>>>>>>>>>> might race against the other uses of the JMX API.
>>>>>>>>>>                                   
>>>>>>>>>>                     
>>>>>>>>> client.removeNotificationListener is not a right way here to test
>>>>>>>>> listener leak, we could use some other ways, for example we keep
>>>>>>>>> the
>>>>>>>>> listener in a weak reference, then after the mbean is removed, the
>>>>>>>>> weak
>>>>>>>>> reference should be empty after some time. Another way is like
>>>>>>>>> DeadListenerTest does to check whether clean has done at server
>>>>>>>>> side:
>>>>>>>>> use reflection to get the "listenerMap" at server side and make
>>>>>>>>> sure it
>>>>>>>>> is empty, but this need to add a private method to the class
>>>>>>>>> ClientNotifForwarder.
>>>>>>>>>                           
>>>>>>>>>                   
>>>>>>>> There will still be problems with timing. You need either to wait
>>>>>>>> for
>>>>>>>> the GC to kick in to clean up the weak ref. And the listenerMap will
>>>>>>>> not
>>>>>>>> be purged of the unregistered MBean listeners until the
>>>>>>>> notification is
>>>>>>>> generated, processed on the ClientNotificationForwarder and
>>>>>>>> forwarded to
>>>>>>>> the server. So there goes the timing issue again.
>>>>>>>>
>>>>>>>> The problem is that the "unregisterMBean" operation does not
>>>>>>>> guarantee
>>>>>>>> that the listeners have been unregistered at the time it returns.
>>>>>>>> So,
>>>>>>>> one way or the other we will need to wait an arbitrary amount of
>>>>>>>> time
>>>>>>>> before checking for the memory leak.
>>>>>>>>                     
>>>>>>>>                 
>>>>>>> Yes we need to wait, but you can use a cycle like:
>>>>>>>        long maxWaitingTime = 3000;
>>>>>>>        long startTime = System.currentTimeMillis();
>>>>>>>        while ( weakReference.get != null
>>>>>>>                && System.currentTimeMillis() < startTime +
>>>>>>> maxWaitingTime) {
>>>>>>>            System.gc();
>>>>>>>            Thread.sleep(100);
>>>>>>>            System.gc();
>>>>>>>        }
>>>>>>>
>>>>>>>       if (weakReference.get != null) {
>>>>>>>          // failed
>>>>>>>       }
>>>>>>>               
>>>>>>>               
>>>>>> Still you need an arbitrary timeout which might be reached under
>>>>>> extreme
>>>>>> conditions making this test to fail intermittently. But I'd say that's
>>>>>> the nature of tests for memory leak fixes, due to the unpredictable
>>>>>> nature of the GC runs. Unless you take a heap dump and do a
>>>>>> reachability
>>>>>> analysis you can not be sure whether a reference is dangling
>>>>>> somwehwere
>>>>>> or it just hasn't been collected yet :/
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>  
>>>>>>        
>>>>>>             
>>>>>>> Shanliang
>>>>>>>             
>>>>>>>               
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>  
>>>>>>>>                 
>>>>>>>>                 
>>>>>>>>> I think we have 3 things to do here:
>>>>>>>>> 1) modify the test to not use removeNotificationListener for
>>>>>>>>> testing
>>>>>>>>> listener leak
>>>>>>>>> 2) create a new bug about a client does not throw an exception
>>>>>>>>> after an
>>>>>>>>> mbean is unregistered
>>>>>>>>> 3) create a bug about a client does not throw a same exception
>>>>>>>>> as at
>>>>>>>>> server side.
>>>>>>>>>
>>>>>>>>> I will do 2) and 3), if you like you can continue 1), it might
>>>>>>>>> need to
>>>>>>>>> do fix also in the JMX implementation.
>>>>>>>>>
>>>>>>>>> Shanliang
>>>>>>>>>                        
>>>>>>>>>                   
>>>>>>>>>>  
>>>>>>>>>>                              
>>>>>>>>>>                     
>>>>>>>>>>> The test "DeadListenerTest" got passed in some machines because
>>>>>>>>>>> of the
>>>>>>>>>>> timeout for waiting a notification. I think its failure just
>>>>>>>>>>> tells
>>>>>>>>>>> a new
>>>>>>>>>>> bug.
>>>>>>>>>>>
>>>>>>>>>>> To set a longer timeout just hides the real bug, and the test
>>>>>>>>>>> might
>>>>>>>>>>> fail
>>>>>>>>>>> again one day if running condition is changed and you might need
>>>>>>>>>>> longer
>>>>>>>>>>> timeout again.
>>>>>>>>>>>                                           
>>>>>>>>>>>                       
>>>>>>>>>> Yes, I agree with you that extending the timeout just lessens the
>>>>>>>>>> likelihood of the race condition and does not prevent it.
>>>>>>>>>>
>>>>>>>>>>  
>>>>>>>>>>                              
>>>>>>>>>>                     
>>>>>>>>>>> Shanliang
>>>>>>>>>>>
>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>                                       
>>>>>>>>>>>                       
>>>>>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>>>>>>>  
>>>>>>>>>>>>                                               
>>>>>>>>>>>>                         
>>>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>>>                                                          
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>>                        
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> I still have no idea why the test failed, but I do not see
>>>>>>>>>>>>>>> why a
>>>>>>>>>>>>>>> longer
>>>>>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>>>>>>>> tested
>>>>>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>                                                                                      
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>>>>>>>> -Xcomp
>>>>>>>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and
>>>>>>>>>>>>>> removes the
>>>>>>>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>>>>>>>> calling
>>>>>>>>>>>>>> "removeNotificationListener" on the main thread is prone to
>>>>>>>>>>>>>> racing.
>>>>>>>>>>>>>>                                                  
>>>>>>>>>>>>>>                         
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>> It is true that ClientNotifForwarder scans the notifications
>>>>>>>>>>>>> for
>>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes
>>>>>>>>>>>>> the
>>>>>>>>>>>>> appropriate notification listeners in a separate thread.
>>>>>>>>>>>>> This is
>>>>>>>>>>>>> for a
>>>>>>>>>>>>> client connection to do clean if a user never calls
>>>>>>>>>>>>> removeNotificationListener.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But calling directly removeNotificationListener from a client
>>>>>>>>>>>>> should
>>>>>>>>>>>>> still get exception if the clean has not been done. As I
>>>>>>>>>>>>> said, if
>>>>>>>>>>>>> the
>>>>>>>>>>>>> client checked and found the listener was still there, then the
>>>>>>>>>>>>> client
>>>>>>>>>>>>> sent a request to its server to remove the listener at server
>>>>>>>>>>>>> side,
>>>>>>>>>>>>> the
>>>>>>>>>>>>> server should find that the MBean in question was not
>>>>>>>>>>>>> registered,
>>>>>>>>>>>>> so the
>>>>>>>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>>>>>>>                                                               
>>>>>>>>>>>>>                           
>>>>>>>>>>>> This won't work. The server side listeners are removed upon
>>>>>>>>>>>> receiving
>>>>>>>>>>>> the "unregistered" notification which is delivered from the
>>>>>>>>>>>> ClientNotificationForwarder and it may have not run yet
>>>>>>>>>>>> (since it
>>>>>>>>>>>> runs
>>>>>>>>>>>> in a separate executor thread). The result is that the
>>>>>>>>>>>> attempt to
>>>>>>>>>>>> remove
>>>>>>>>>>>> the notification listener on the server will succeed as well
>>>>>>>>>>>> failing
>>>>>>>>>>>> the
>>>>>>>>>>>> test subsequently.
>>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>>>>>>>
>>>>>>>>>>>>  
>>>>>>>>>>>>                                               
>>>>>>>>>>>>                         
>>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>>                                                          
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>>                        
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> The timeout you made longer was used to wait a notification
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>> never arrive.
>>>>>>>>>>>>>>>                                                                                      
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>> Well, it can be used to allow more time to process the
>>>>>>>>>>>>>> "unregister"
>>>>>>>>>>>>>> notification, too.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>>>>>>>> condition.
>>>>>>>>>>>>>> An updated webrev will follow.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>>                        
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>>>>>>>> 1) at client side, check whether it was added in the
>>>>>>>>>>>>>>> client side
>>>>>>>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>>>>>>>> registered in
>>>>>>>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So 2) tells that we did not rely on a "unregister"
>>>>>>>>>>>>>>> notification.
>>>>>>>>>>>>>>> Anyway,
>>>>>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>>>>>>>> unregister an
>>>>>>>>>>>>>>> mbean, then any following call (without any time break) to
>>>>>>>>>>>>>>> use the
>>>>>>>>>>>>>>> mbean
>>>>>>>>>>>>>>> should fail, like "removeNotificationListener",
>>>>>>>>>>>>>>> "isRegistered"
>>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I do see a bug here, if we remove a listener from a
>>>>>>>>>>>>>>> non-existing
>>>>>>>>>>>>>>> MBeam,
>>>>>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>>>>>>>> create a
>>>>>>>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>>>>>>>> MBeanServerConnection.
>>>>>>>>>>>>>>>                                                                                      
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>>                        
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>>                                                       
>>>>>>>>>>>>>>>                          
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Webrev at
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>>>>>>>> fires the "unregister" notification which is sent to the
>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>>>>>>>> notification
>>>>>>>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>>>>>>>> hasn't
>>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There is no synchronization between the client and the
>>>>>>>>>>>>>>>> server and
>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>> race condition can occur occasionally. Normally, the
>>>>>>>>>>>>>>>> execution is
>>>>>>>>>>>>>>>> fast
>>>>>>>>>>>>>>>> enough to behave like the "unregister" notification is
>>>>>>>>>>>>>>>> processed
>>>>>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it
>>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option
>>>>>>>>>>>>>>>> strains
>>>>>>>>>>>>>>>> the CPU
>>>>>>>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There is no proper fix for this - the only thing that
>>>>>>>>>>>>>>>> work is
>>>>>>>>>>>>>>>> waiting a
>>>>>>>>>>>>>>>> bit longer in the main thread to give the notification
>>>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>>>                                                 
>>>>>>>>>>>>>>>>                                                   
>>>>>>>>>>>>>>>>                                 
>>>>>>>>>>>>>>                                                  
>>>>>>>>>>>>>>                         
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>                                                               
>>>>>>>>>>>>>                           
>>>>>>>>>>>>                                                     
>>>>>>>>>>>>                         
>>>>>>>>>>>                                           
>>>>>>>>>>>                       
>>>>>>>>>>                                   
>>>>>>>>>>                     
>>>>>>>>>                           
>>>>>>>>>                   
>>>>>>>>                     
>>>>>>>>                 
>>>>>>>               
>>>>>>>               
>>>>>>           
>>>>>>             
>>>>>       
>>>>>           
>>>   
>>>       
>>     
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130110/9248c139/attachment-0001.html 

From jaroslav.bachorik at oracle.com  Thu Jan 10 07:20:03 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 10 Jan 2013 16:20:03 +0100
Subject: jmx-dev [PATCH] JDK-8005472:
 com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh
 failed on windows
In-Reply-To: <50EE813A.1020501@oracle.com>
References: <50E16BA8.40203@oracle.com>
	<682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com>
	<50EAB014.30805@oracle.com> <50EE813A.1020501@oracle.com>
Message-ID: <50EEDC23.5080005@oracle.com>

Update: http://cr.openjdk.java.net/~jbachorik/8005472/webrev.04

On 01/10/2013 09:52 AM, Stuart Marks wrote:
> On 1/7/13 3:23 AM, Jaroslav Bachorik wrote:
>> On 01/04/2013 11:37 PM, Kelly O'Hair wrote:
>>> I suspect it is not hanging because it does not exist, but that some
>>> other windows process has it's hands on it.
>>> This is the stdout file from the server being started up right?
>>> Could the server from a previous test run be still running?
>>
>> Exactly. Amy confirmed this and provided a patch which resolves the
>> hanging problem.
>>
>> The update patch is at
>> http://cr.openjdk.java.net/~jbachorik/8005472/webrev.01
> 
> Hi Jaroslav,
> 
> The change to remove the parentheses from around the server program
> looks right. It avoids forking an extra process (at least in some
> shells) and lets $! refer to the actual JVM, not an intermediate shell
> process. The rm -f from Kelly's suggestion is good too.
> 
> But there are other things wrong with the script. I don't think they
> could cause hanging, but they could cause the script to fail in
> unforeseen ways, or even to report success incorrectly.
> 
> One problem is introduced by the change, where the Server's stderr is
> also redirected into $URL_PATH along with stdout. This means that if the
> Server program reports any errors, they'll get mixed into the URL_PATH
> file instead of appearing in the test log. The URL_PATH file's contents
> is never reported, so these error messages will be invisible.

Fixed, only the stdout is redirected to $URL_PATH.

> 
> The exit status of some of the critical commands (such as the
> compilations) isn't checked, so if javac fails for some reason, the test
> might not report failure. Instead, some weird error might or might not
> be reported later (though one will still see the javac errors in the log).

Fixed, introduced the check. The "set -e" was hanging the script so I
have to check for the exit status manually.

> 
> I don't think the sleep at line 80 is necessary, since the client runs
> synchronously and should have exited by this point.

And it's gone.

> 
> The wait loop checking for the existence of the URL_PATH file doesn't
> actually guarantee that the server is running or has initialized yet.
> The file is actually created by the shell before the Server JVM starts
> up. Thus, runClient might try to read from it before the server has
> written anything to it. Or, as mentioned above, the server might have
> written some error messages into the URL_PATH file instead of the
> expected contents. Thus, the contents of the JMXURL variable can quite
> possibly be incorrect.

The err is not redirected to the file. A separate file is used to signal
the availability of the server and that file is created from the java
code after the server has been started. Also, the err and out  streams
are flushed to make sure the JMX URL makes it into the file.

> 
> If this occurs, what will happen when the client runs? It may emit some
> error message, and this will be filtered out by the grep pipeline. Thus,
> HAS_ERRORS might end up empty, and the test will report passing, even
> though everything has failed!

Shouldn't happen with only the controlled stdout redirected to the file.

> 
> For this changeset I'd recommend at a minimum removing the redirection
> of stderr to URL_PATH. If the server fails we'll at least see errors in
> the test log.
> 
> For checking the notification message, is there a way to modify the
> client to report an exit status or throw an exception? Throwing an
> exception from main() will exit the JVM with a nonzero status, so this
> can be checked more easily from the script. I think this is less
> error-prone than grepping the output for a specific error message. The
> test should fail if there is *any* error; it should not succeed if an
> expected error is absent.

This is unfortunately not possible. The notification processing needs to
be robust enough to prevent exiting JVM in cases like this. Therefore it
only reports the problem, dumps the notification and carries on. The
only place one can find something went wrong is the err stream.

> 
> You might consider having jtreg build the client and server classes.
> This might simplify some of the setup. Also, jtreg is meticulous about
> aborting the test if any compilations fail, so it takes care of that for
> you.

I need same name classes with incompatible code compiled to two
different locations - client and server. I was not able to figure out
how to use jtreg to accomplish that.

-JB-

> 
> It would be nice if there were a better way to have the client
> rendezvous with the server. I hate to suggest it, but sleeping
> unconditionally after starting the server is probably necessary.
> Anything more robust probably requires rearchitecting the test, though.
> 
> Sorry to dump all this on you. But one of the shell-based RMI tests
> suffers from *exactly* the same pathologies. (I have yet to fix it.)
> Unfortunately, I believe that there are a lot of other shell-based tests
> in the test suite that have similar problems. The lesson here is that
> writing reliable shell tests is a lot harder than it seems.
> 
> Thanks,
> 
> s'marks


From stuart.marks at oracle.com  Thu Jan 10 13:44:02 2013
From: stuart.marks at oracle.com (Stuart Marks)
Date: Thu, 10 Jan 2013 13:44:02 -0800
Subject: jmx-dev [PATCH] JDK-8005472:
 com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh
 failed on windows
In-Reply-To: <50EEDC23.5080005@oracle.com>
References: <50E16BA8.40203@oracle.com>
	<682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com>
	<50EAB014.30805@oracle.com> <50EE813A.1020501@oracle.com>
	<50EEDC23.5080005@oracle.com>
Message-ID: <50EF3622.9050500@oracle.com>

On 1/10/13 7:20 AM, Jaroslav Bachorik wrote:
> Update: http://cr.openjdk.java.net/~jbachorik/8005472/webrev.04

Thanks for the update.

Note, argv[0] is used before argv.length is checked, so if no args are passed 
this gives index out of bounds instead of the usage message.

I see you take pains to write and flush the URL to stdout before writing the 
signaling file. Good. The obvious alternative (which I started writing but then 
erased) is just to put the URL into the signaling file. But this has a race 
between creation of the file and the writing of its contents. So, what you have 
works. (This kind of rendezvous problem occurs a lot; it seems like there ought 
to be a simpler way.)

I suspect the -e option caused hangs because if something failed, it would 
leave the server running, spoiling the next test run. The usual way to deal 
with this is to use the shell 'trap' statement, to kill subprocesses and remove 
temp files before exiting the shell. Probably a good practice in general, but 
perhaps too much shell hackery for this change. (Up to you if you want to 
tackle it.)

Regarding how the test is detecting success/failure, the concern is that if the 
client fails for some reason other than the failure being checked for, the test 
will still report passing. Since the error message is coming out of the client 
JVM, in principle it ought to be possible to redirect it somehow in order to do 
the assertion checking in Java. With the current shell scheme, not only are 
other failures reported as the test passing, these other failures are erased in 
the grep pipeline, so they're not even visible in the test log.

This last issue is rather far afield from this webrev, and fixing it will 
probably require some rearchitecting of the test. So maybe it should be 
considered independently. I just happened to notice this going on, and I 
noticed the similarity to what's going on in the RMI tests.

s'marks


> On 01/10/2013 09:52 AM, Stuart Marks wrote:
>> On 1/7/13 3:23 AM, Jaroslav Bachorik wrote:
>>> On 01/04/2013 11:37 PM, Kelly O'Hair wrote:
>>>> I suspect it is not hanging because it does not exist, but that some
>>>> other windows process has it's hands on it.
>>>> This is the stdout file from the server being started up right?
>>>> Could the server from a previous test run be still running?
>>>
>>> Exactly. Amy confirmed this and provided a patch which resolves the
>>> hanging problem.
>>>
>>> The update patch is at
>>> http://cr.openjdk.java.net/~jbachorik/8005472/webrev.01
>>
>> Hi Jaroslav,
>>
>> The change to remove the parentheses from around the server program
>> looks right. It avoids forking an extra process (at least in some
>> shells) and lets $! refer to the actual JVM, not an intermediate shell
>> process. The rm -f from Kelly's suggestion is good too.
>>
>> But there are other things wrong with the script. I don't think they
>> could cause hanging, but they could cause the script to fail in
>> unforeseen ways, or even to report success incorrectly.
>>
>> One problem is introduced by the change, where the Server's stderr is
>> also redirected into $URL_PATH along with stdout. This means that if the
>> Server program reports any errors, they'll get mixed into the URL_PATH
>> file instead of appearing in the test log. The URL_PATH file's contents
>> is never reported, so these error messages will be invisible.
>
> Fixed, only the stdout is redirected to $URL_PATH.
>
>>
>> The exit status of some of the critical commands (such as the
>> compilations) isn't checked, so if javac fails for some reason, the test
>> might not report failure. Instead, some weird error might or might not
>> be reported later (though one will still see the javac errors in the log).
>
> Fixed, introduced the check. The "set -e" was hanging the script so I
> have to check for the exit status manually.
>
>>
>> I don't think the sleep at line 80 is necessary, since the client runs
>> synchronously and should have exited by this point.
>
> And it's gone.
>
>>
>> The wait loop checking for the existence of the URL_PATH file doesn't
>> actually guarantee that the server is running or has initialized yet.
>> The file is actually created by the shell before the Server JVM starts
>> up. Thus, runClient might try to read from it before the server has
>> written anything to it. Or, as mentioned above, the server might have
>> written some error messages into the URL_PATH file instead of the
>> expected contents. Thus, the contents of the JMXURL variable can quite
>> possibly be incorrect.
>
> The err is not redirected to the file. A separate file is used to signal
> the availability of the server and that file is created from the java
> code after the server has been started. Also, the err and out  streams
> are flushed to make sure the JMX URL makes it into the file.
>
>>
>> If this occurs, what will happen when the client runs? It may emit some
>> error message, and this will be filtered out by the grep pipeline. Thus,
>> HAS_ERRORS might end up empty, and the test will report passing, even
>> though everything has failed!
>
> Shouldn't happen with only the controlled stdout redirected to the file.
>
>>
>> For this changeset I'd recommend at a minimum removing the redirection
>> of stderr to URL_PATH. If the server fails we'll at least see errors in
>> the test log.
>>
>> For checking the notification message, is there a way to modify the
>> client to report an exit status or throw an exception? Throwing an
>> exception from main() will exit the JVM with a nonzero status, so this
>> can be checked more easily from the script. I think this is less
>> error-prone than grepping the output for a specific error message. The
>> test should fail if there is *any* error; it should not succeed if an
>> expected error is absent.
>
> This is unfortunately not possible. The notification processing needs to
> be robust enough to prevent exiting JVM in cases like this. Therefore it
> only reports the problem, dumps the notification and carries on. The
> only place one can find something went wrong is the err stream.
>
>>
>> You might consider having jtreg build the client and server classes.
>> This might simplify some of the setup. Also, jtreg is meticulous about
>> aborting the test if any compilations fail, so it takes care of that for
>> you.
>
> I need same name classes with incompatible code compiled to two
> different locations - client and server. I was not able to figure out
> how to use jtreg to accomplish that.
>
> -JB-
>
>>
>> It would be nice if there were a better way to have the client
>> rendezvous with the server. I hate to suggest it, but sleeping
>> unconditionally after starting the server is probably necessary.
>> Anything more robust probably requires rearchitecting the test, though.
>>
>> Sorry to dump all this on you. But one of the shell-based RMI tests
>> suffers from *exactly* the same pathologies. (I have yet to fix it.)
>> Unfortunately, I believe that there are a lot of other shell-based tests
>> in the test suite that have similar problems. The lesson here is that
>> writing reliable shell tests is a lot harder than it seems.
>>
>> Thanks,
>>
>> s'marks
>

From Alan.Bateman at oracle.com  Tue Jan 15 06:34:30 2013
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Tue, 15 Jan 2013 14:34:30 +0000
Subject: jmx-dev Update MXBeans to allow for the possibility that
 ConstructorProperties is ignored?
Message-ID: <50F568F6.8020708@oracle.com>


With the Compact Profiles proposal [1], there will be a subset Profile 
of Java SE that has JMX but not java.beans. This creates a challenge for 
the MXBean spec where a constructor to reconstitute a type may be used 
if it has the java.beans.ConstructorProperties annotation.

For code that is compiled against a compact profile ("javac -profile 
compact3" for example) then it's not an issue because using using this 
annotation will not compile. However if there is code using this 
annotation that is compiled against the full platform but run on a 
runtime that implements compact3 then the annotation will be ignored.

I'm wondering whether to add a clarification to the MXBean on this. It 
would essentially amount updating the rules under "Reconstructing an 
instance of Java type J from a CompositeData" so that it's clear that 
rule 2 does apply when running on a subset Profile of Java SE.

I'm looking for opinions on whether this is necessary or not.

-Alan

[1] http://openjdk.java.net/jeps/161

From Alan.Bateman at oracle.com  Wed Jan 16 08:25:11 2013
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Wed, 16 Jan 2013 16:25:11 +0000
Subject: jmx-dev Update MXBeans to allow for the possibility that
 ConstructorProperties is ignored?
In-Reply-To: <50F568F6.8020708@oracle.com>
References: <50F568F6.8020708@oracle.com>
Message-ID: <50F6D467.3080308@oracle.com>

On 15/01/2013 14:34, Alan Bateman wrote:
> :
>
>
> I'm wondering whether to add a clarification to the MXBean on this. It 
> would essentially amount updating the rules under "Reconstructing an 
> instance of Java type J from a CompositeData" so that it's clear that 
> rule 2 does apply when running on a subset Profile of Java SE.
Thinking more about it, I think it would be safer and clearer to add a 
clarification. Here is what I propose:

"Rule 2 is not applicable to subset Profiles of Java SE that do not 
include the {@code java.beans} package. In that case it may not be 
possible to reconstruct an instance of <em>J</em>, or it may be 
reconstructed by the means defined by subsequent rules."

Does that seem reasonable?

-Alan.

From Alan.Bateman at oracle.com  Mon Jan 21 08:08:03 2013
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Mon, 21 Jan 2013 16:08:03 +0000
Subject: jmx-dev Update MXBeans to allow for the possibility that
 ConstructorProperties is ignored?
In-Reply-To: <50F568F6.8020708@oracle.com>
References: <50F568F6.8020708@oracle.com>
Message-ID: <50FD67E3.8040609@oracle.com>


I've put a webrev here with the proposed changes here:

http://cr.openjdk.java.net/~alanb/8006524/webrev/

In summary, it makes it clear that @ConstructorProperties is not 
applicable when the runtime does not have this annotation. In the future 
then it might may be desirable to consider adding 
javax.management.ConstructorProperties and supporting both annotations. 
I don't propose to do this now because it would further consideration, 
including perhaps supporting both annotations in the java.beans 
persistence support.

Thanks,

Alan.

From mandy.chung at oracle.com  Wed Jan 23 23:12:07 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Wed, 23 Jan 2013 23:12:07 -0800
Subject: jmx-dev Update MXBeans to allow for the possibility that
 ConstructorProperties is ignored?
In-Reply-To: <50FD67E3.8040609@oracle.com>
References: <50F568F6.8020708@oracle.com> <50FD67E3.8040609@oracle.com>
Message-ID: <5100DEC7.9050704@oracle.com>

On 1/21/2013 8:08 AM, Alan Bateman wrote:
>
> I've put a webrev here with the proposed changes here:
>
> http://cr.openjdk.java.net/~alanb/8006524/webrev/
>

This looks reasonable to me.

> In summary, it makes it clear that @ConstructorProperties is not 
> applicable when the runtime does not have this annotation. In the 
> future then it might may be desirable to consider adding 
> javax.management.ConstructorProperties and supporting both 
> annotations. I don't propose to do this now because it would further 
> consideration, including perhaps supporting both annotations in the 
> java.beans persistence support.
>

I'm fine with the proposed spec change and look into the addition of 
javax.management.ConstructorProperties later.   For now, to register 
such a MXBean on a runtime of compact3 profile (without java.beans), it 
will fail with NotCompliantMBeanException that helps diagnosing the 
problem (unless a type can be reconstructed via other rules).

Mandy


From Alan.Bateman at oracle.com  Thu Jan 24 04:25:19 2013
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Thu, 24 Jan 2013 12:25:19 +0000
Subject: jmx-dev Update MXBeans to allow for the possibility that
 ConstructorProperties is ignored?
In-Reply-To: <5100DEC7.9050704@oracle.com>
References: <50F568F6.8020708@oracle.com> <50FD67E3.8040609@oracle.com>
	<5100DEC7.9050704@oracle.com>
Message-ID: <5101282F.6050201@oracle.com>

On 24/01/2013 07:12, Mandy Chung wrote:
>
> I'm fine with the proposed spec change and look into the addition of 
> javax.management.ConstructorProperties later.   For now, to register 
> such a MXBean on a runtime of compact3 profile (without java.beans), 
> it will fail with NotCompliantMBeanException that helps diagnosing the 
> problem (unless a type can be reconstructed via other rules).
Thanks for the review. This is really just a mismatch between the 
compile-time and runtime environments, it would be caught by 
compile-time if compiled with "javac -profile compact3". I guess the 
only genuine scenario where it might be an issue is where someone runs a 
static analyzer over some code and it doesn't see the dependency because 
it's an annotation. In that case, it would fail when attempting to 
register the object and I hope is wouldn't be too difficult to diagnose 
(way back, in preparation for this, I tweaked the "applicable" method so 
that the exception is clearer when the annotation is not available).

I've pushed this change to get it out of the way. In the future then it 
does need exploring the implications of adding 
javax.management.ConstructorProperties. I think we would have an 
inconsistency if this were added without corresponding support in 
JavaBeans persistence.

-Alan