From kelly.ohair at oracle.com Fri Jan 4 14:37:46 2013 From: kelly.ohair at oracle.com (Kelly O'Hair) Date: Fri, 4 Jan 2013 14:37:46 -0800 Subject: jmx-dev [PATCH] JDK-8005472: com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh failed on windows In-Reply-To: <50E16BA8.40203@oracle.com> References: <50E16BA8.40203@oracle.com> Message-ID: <682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com> On Dec 31, 2012, at 2:40 AM, Jaroslav Bachorik wrote: > Looking for a review and a sponsor. > > Webrev at: > http://cr.openjdk.java.net/~jbachorik/8005472/webrev.00/test/com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh.sdiff.html > > JPRT run on windows targets: > http://sthjprt.se.oracle.com/archives/2012/12/2012-12-28-123054.jbachorik.openjdk8-tl//JobStatus.txt > > The issue is about a new test failing when run on windows machines. It > seems that the cygwin really does not like removing a non-existent file > - to the extent of hanging the script indefinitely. I suspect it is not hanging because it does not exist, but that some other windows process has it's hands on it. This is the stdout file from the server being started up right? Could the server from a previous test run be still running? Maybe a better answer might be to make the filename a bit more unique, like maybe foobar.$$ ??? > > The patch adds a pre-check for the existence of the file to be removed. > It does not change the test in any other way. This test doesn't make much sense to me. rm should never hang on a non existent file. And by the way, it might be a good idea for scripts to always use 'rm -f', which is what the default is for Makefiles with $(RM) -kto > > > Thanks, > > -JB- From jaroslav.bachorik at oracle.com Mon Jan 7 03:23:00 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 07 Jan 2013 12:23:00 +0100 Subject: jmx-dev [PATCH] JDK-8005472: com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh failed on windows In-Reply-To: <682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com> References: <50E16BA8.40203@oracle.com> <682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com> Message-ID: <50EAB014.30805@oracle.com> On 01/04/2013 11:37 PM, Kelly O'Hair wrote: > > On Dec 31, 2012, at 2:40 AM, Jaroslav Bachorik wrote: > >> Looking for a review and a sponsor. >> >> Webrev at: >> http://cr.openjdk.java.net/~jbachorik/8005472/webrev.00/test/com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh.sdiff.html >> >> JPRT run on windows targets: >> http://sthjprt.se.oracle.com/archives/2012/12/2012-12-28-123054.jbachorik.openjdk8-tl//JobStatus.txt >> >> The issue is about a new test failing when run on windows machines. It >> seems that the cygwin really does not like removing a non-existent file >> - to the extent of hanging the script indefinitely. > > I suspect it is not hanging because it does not exist, but that some other windows process has it's hands on it. > This is the stdout file from the server being started up right? > Could the server from a previous test run be still running? Exactly. Amy confirmed this and provided a patch which resolves the hanging problem. The update patch is at http://cr.openjdk.java.net/~jbachorik/8005472/webrev.01 -JB- > > Maybe a better answer might be to make the filename a bit more unique, like maybe foobar.$$ ??? > >> >> The patch adds a pre-check for the existence of the file to be removed. >> It does not change the test in any other way. > > This test doesn't make much sense to me. rm should never hang on a non existent file. > > And by the way, it might be a good idea for scripts to always use 'rm -f', which is what the default is for Makefiles with $(RM) > > > -kto > >> >> >> Thanks, >> >> -JB- > From jaroslav.bachorik at oracle.com Mon Jan 7 05:44:03 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 07 Jan 2013 14:44:03 +0100 Subject: jmx-dev [PATCH] JDK-8005791: Remove java.beans.* imports from com.sun.jmx.mbeanserver.Introspector Message-ID: <50EAD123.7040202@oracle.com> Looking for reviewers and a sponsor. This is a simple patch to remove unused java.beans.* imports from com.sun.jmx.mbeanserver.Introspector. The actual usage of java.beans.* classes was removed from the Introspector only the imports are left dangling. The webrev is at http://cr.openjdk.java.net/~jbachorik/8005791/webrev.00/ Thanks, -JB- From jaroslav.bachorik at oracle.com Tue Jan 8 07:16:32 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Tue, 08 Jan 2013 16:16:32 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure Message-ID: <50EC3850.7080508@oracle.com> Looking for review and a sponsor. Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 In this issue the timing is the problem. MBeanServer.unregisterMBean() fires the "unregister" notification which is sent to the server asynchronously. Thus it may happen that the "unregister" notification has not been yet processed at the time of invoking removeNotificationListener() and the notification listeners hasn't been cleaned up leading to the test failure. There is no synchronization between the client and the server and such race condition can occur occasionally. Normally, the execution is fast enough to behave like the "unregister" notification is processed synchronously with the unregisterMBean() operation but it seems that using fastdebug Server VM bits with the -Xcomp option strains the CPU enough to make this problem appear. There is no proper fix for this - the only thing that work is waiting a bit longer in the main thread to give the notification processing thread some time to clean up the listeners. Regards, -JB- From shanliang.jiang at oracle.com Wed Jan 9 00:40:42 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Wed, 09 Jan 2013 09:40:42 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50EC3850.7080508@oracle.com> References: <50EC3850.7080508@oracle.com> Message-ID: <50ED2D0A.5000509@oracle.com> I still have no idea why the test failed, but I do not see why a longer timeout can fix the test. Have you reproduced the problem and tested your fix? if yes then possible the long timeout hided a real problem. The timeout you made longer was used to wait a notification which should never arrive. To remove a listener from a client side, we did: 1) at client side, check whether it was added in the client side 2) at server side, check whether the MBean in question was registered in the MBeanServer (!!!) 3) at server side, check whether the listener was added. So 2) tells that we did not rely on a "unregister" notification. Anyway, if you use a SAME thread to call "unregister" operation to unregister an mbean, then any following call (without any time break) to use the mbean should fail, like "removeNotificationListener", "isRegistered" etc. I do see a bug here, if we remove a listener from a non-existing MBeam, we get "ListenerNotFoundException" at a client side, but get "InstanceNotFoundException" at server side, I think we should create a bug, because both implemented the same interface MBeanServerConnection. Shanliang Jaroslav Bachorik wrote: > Looking for review and a sponsor. > > Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 > > In this issue the timing is the problem. MBeanServer.unregisterMBean() > fires the "unregister" notification which is sent to the server > asynchronously. Thus it may happen that the "unregister" notification > has not been yet processed at the time of invoking > removeNotificationListener() and the notification listeners hasn't been > cleaned up leading to the test failure. > > There is no synchronization between the client and the server and such > race condition can occur occasionally. Normally, the execution is fast > enough to behave like the "unregister" notification is processed > synchronously with the unregisterMBean() operation but it seems that > using fastdebug Server VM bits with the -Xcomp option strains the CPU > enough to make this problem appear. > > There is no proper fix for this - the only thing that work is waiting a > bit longer in the main thread to give the notification processing thread > some time to clean up the listeners. > > Regards, > > -JB- > From jaroslav.bachorik at oracle.com Wed Jan 9 01:45:51 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 09 Jan 2013 10:45:51 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50ED2D0A.5000509@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> Message-ID: <50ED3C4F.1070001@oracle.com> On 01/09/2013 09:40 AM, shanliang wrote: > I still have no idea why the test failed, but I do not see why a longer > timeout can fix the test. Have you reproduced the problem and tested > your fix? if yes then possible the long timeout hided a real problem. Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp switch) and verify that the fix makes the test pass. The ClientNotifForwarder scans the notifications for MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the appropriate notification listeners in a separate thread. Thus, calling "removeNotificationListener" on the main thread is prone to racing. > > The timeout you made longer was used to wait a notification which should > never arrive. Well, it can be used to allow more time to process the "unregister" notification, too. When I think more of this I am more inclined to fix the race condition. An updated webrev will follow. > > To remove a listener from a client side, we did: > 1) at client side, check whether it was added in the client side > 2) at server side, check whether the MBean in question was registered in > the MBeanServer (!!!) > 3) at server side, check whether the listener was added. > > So 2) tells that we did not rely on a "unregister" notification. Anyway, > if you use a SAME thread to call "unregister" operation to unregister an > mbean, then any following call (without any time break) to use the mbean > should fail, like "removeNotificationListener", "isRegistered" etc. > > I do see a bug here, if we remove a listener from a non-existing MBeam, > we get "ListenerNotFoundException" at a client side, but get > "InstanceNotFoundException" at server side, I think we should create a > bug, because both implemented the same interface MBeanServerConnection. Yes, it is rather inconsistent. -JB- > > Shanliang > > Jaroslav Bachorik wrote: >> Looking for review and a sponsor. >> >> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >> >> In this issue the timing is the problem. MBeanServer.unregisterMBean() >> fires the "unregister" notification which is sent to the server >> asynchronously. Thus it may happen that the "unregister" notification >> has not been yet processed at the time of invoking >> removeNotificationListener() and the notification listeners hasn't been >> cleaned up leading to the test failure. >> >> There is no synchronization between the client and the server and such >> race condition can occur occasionally. Normally, the execution is fast >> enough to behave like the "unregister" notification is processed >> synchronously with the unregisterMBean() operation but it seems that >> using fastdebug Server VM bits with the -Xcomp option strains the CPU >> enough to make this problem appear. >> >> There is no proper fix for this - the only thing that work is waiting a >> bit longer in the main thread to give the notification processing thread >> some time to clean up the listeners. >> >> Regards, >> >> -JB- >> > From shanliang.jiang at oracle.com Wed Jan 9 02:08:44 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Wed, 09 Jan 2013 11:08:44 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50ED3C4F.1070001@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> Message-ID: <50ED41AC.4010007@oracle.com> Jaroslav Bachorik wrote: > On 01/09/2013 09:40 AM, shanliang wrote: > >> I still have no idea why the test failed, but I do not see why a longer >> timeout can fix the test. Have you reproduced the problem and tested >> your fix? if yes then possible the long timeout hided a real problem. >> > > Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp > switch) and verify that the fix makes the test pass. > > The ClientNotifForwarder scans the notifications for > MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the > appropriate notification listeners in a separate thread. Thus, calling > "removeNotificationListener" on the main thread is prone to racing. > It is true that ClientNotifForwarder scans the notifications for MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the appropriate notification listeners in a separate thread. This is for a client connection to do clean if a user never calls removeNotificationListener. But calling directly removeNotificationListener from a client should still get exception if the clean has not been done. As I said, if the client checked and found the listener was still there, then the client sent a request to its server to remove the listener at server side, the server should find that the MBean in question was not registered, so the server should throw an exception. The bug might be here. Shanliang > >> The timeout you made longer was used to wait a notification which should >> never arrive. >> > > Well, it can be used to allow more time to process the "unregister" > notification, too. > > When I think more of this I am more inclined to fix the race condition. > An updated webrev will follow. > > >> To remove a listener from a client side, we did: >> 1) at client side, check whether it was added in the client side >> 2) at server side, check whether the MBean in question was registered in >> the MBeanServer (!!!) >> 3) at server side, check whether the listener was added. >> >> So 2) tells that we did not rely on a "unregister" notification. Anyway, >> if you use a SAME thread to call "unregister" operation to unregister an >> mbean, then any following call (without any time break) to use the mbean >> should fail, like "removeNotificationListener", "isRegistered" etc. >> >> I do see a bug here, if we remove a listener from a non-existing MBeam, >> we get "ListenerNotFoundException" at a client side, but get >> "InstanceNotFoundException" at server side, I think we should create a >> bug, because both implemented the same interface MBeanServerConnection. >> > > Yes, it is rather inconsistent. > > -JB- > > >> Shanliang >> >> Jaroslav Bachorik wrote: >> >>> Looking for review and a sponsor. >>> >>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>> >>> In this issue the timing is the problem. MBeanServer.unregisterMBean() >>> fires the "unregister" notification which is sent to the server >>> asynchronously. Thus it may happen that the "unregister" notification >>> has not been yet processed at the time of invoking >>> removeNotificationListener() and the notification listeners hasn't been >>> cleaned up leading to the test failure. >>> >>> There is no synchronization between the client and the server and such >>> race condition can occur occasionally. Normally, the execution is fast >>> enough to behave like the "unregister" notification is processed >>> synchronously with the unregisterMBean() operation but it seems that >>> using fastdebug Server VM bits with the -Xcomp option strains the CPU >>> enough to make this problem appear. >>> >>> There is no proper fix for this - the only thing that work is waiting a >>> bit longer in the main thread to give the notification processing thread >>> some time to clean up the listeners. >>> >>> Regards, >>> >>> -JB- >>> >>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130109/72910e24/attachment.html From jaroslav.bachorik at oracle.com Wed Jan 9 05:15:58 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 09 Jan 2013 14:15:58 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50ED41AC.4010007@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> Message-ID: <50ED6D8E.6070404@oracle.com> On 01/09/2013 11:08 AM, shanliang wrote: > Jaroslav Bachorik wrote: >> On 01/09/2013 09:40 AM, shanliang wrote: >> >>> I still have no idea why the test failed, but I do not see why a longer >>> timeout can fix the test. Have you reproduced the problem and tested >>> your fix? if yes then possible the long timeout hided a real problem. >>> >> >> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp >> switch) and verify that the fix makes the test pass. >> >> The ClientNotifForwarder scans the notifications for >> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >> appropriate notification listeners in a separate thread. Thus, calling >> "removeNotificationListener" on the main thread is prone to racing. >> > It is true that ClientNotifForwarder scans the notifications for > MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the > appropriate notification listeners in a separate thread. This is for a > client connection to do clean if a user never calls > removeNotificationListener. > > But calling directly removeNotificationListener from a client should > still get exception if the clean has not been done. As I said, if the > client checked and found the listener was still there, then the client > sent a request to its server to remove the listener at server side, the > server should find that the MBean in question was not registered, so the > server should throw an exception. The bug might be here. This won't work. The server side listeners are removed upon receiving the "unregistered" notification which is delivered from the ClientNotificationForwarder and it may have not run yet (since it runs in a separate executor thread). The result is that the attempt to remove the notification listener on the server will succeed as well failing the test subsequently. -JB- > > Shanliang >> >>> The timeout you made longer was used to wait a notification which should >>> never arrive. >>> >> >> Well, it can be used to allow more time to process the "unregister" >> notification, too. >> >> When I think more of this I am more inclined to fix the race condition. >> An updated webrev will follow. >> >> >>> To remove a listener from a client side, we did: >>> 1) at client side, check whether it was added in the client side >>> 2) at server side, check whether the MBean in question was registered in >>> the MBeanServer (!!!) >>> 3) at server side, check whether the listener was added. >>> >>> So 2) tells that we did not rely on a "unregister" notification. Anyway, >>> if you use a SAME thread to call "unregister" operation to unregister an >>> mbean, then any following call (without any time break) to use the mbean >>> should fail, like "removeNotificationListener", "isRegistered" etc. >>> >>> I do see a bug here, if we remove a listener from a non-existing MBeam, >>> we get "ListenerNotFoundException" at a client side, but get >>> "InstanceNotFoundException" at server side, I think we should create a >>> bug, because both implemented the same interface MBeanServerConnection. >>> >> >> Yes, it is rather inconsistent. >> >> -JB- >> >> >>> Shanliang >>> >>> Jaroslav Bachorik wrote: >>> >>>> Looking for review and a sponsor. >>>> >>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>> >>>> In this issue the timing is the problem. MBeanServer.unregisterMBean() >>>> fires the "unregister" notification which is sent to the server >>>> asynchronously. Thus it may happen that the "unregister" notification >>>> has not been yet processed at the time of invoking >>>> removeNotificationListener() and the notification listeners hasn't been >>>> cleaned up leading to the test failure. >>>> >>>> There is no synchronization between the client and the server and such >>>> race condition can occur occasionally. Normally, the execution is fast >>>> enough to behave like the "unregister" notification is processed >>>> synchronously with the unregisterMBean() operation but it seems that >>>> using fastdebug Server VM bits with the -Xcomp option strains the CPU >>>> enough to make this problem appear. >>>> >>>> There is no proper fix for this - the only thing that work is waiting a >>>> bit longer in the main thread to give the notification processing >>>> thread >>>> some time to clean up the listeners. >>>> >>>> Regards, >>>> >>>> -JB- >>>> >> >> > > From shanliang.jiang at oracle.com Wed Jan 9 05:44:22 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Wed, 09 Jan 2013 14:44:22 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50ED6D8E.6070404@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> Message-ID: <50ED7436.1020205@oracle.com> Let's forget the JMX implementation at first. If an MBean is unregistered, a user at client side calls "removeNotificationListener" on the MBean, what should happen? if the user calls "isRegistered" on the MBean, what should happen? I have done 2 tests, I used only one thread: 1) ...... localServer.unregisterMBean(myMBean); boolean isRegistered = remoteClientServer.isRegistered(myMBean)); I got isRegistered = false; 2) ...... localServer.unregisterMBean(myMBean); System.out.println("isRegistered = "+remoteClientServer.sRegistered(myMBean)); remoteClientServer.removeNotificationListener(myMBean, listener); I did not get an exception. The 1) told that the client could know the MBean was unregistered, then the client should throw an exception for the call of "removeNotificationListener" in 2). The test "DeadListenerTest" got passed in some machines because of the timeout for waiting a notification. I think its failure just tells a new bug. To set a longer timeout just hides the real bug, and the test might fail again one day if running condition is changed and you might need longer timeout again. Shanliang Jaroslav Bachorik wrote: > On 01/09/2013 11:08 AM, shanliang wrote: > >> Jaroslav Bachorik wrote: >> >>> On 01/09/2013 09:40 AM, shanliang wrote: >>> >>> >>>> I still have no idea why the test failed, but I do not see why a longer >>>> timeout can fix the test. Have you reproduced the problem and tested >>>> your fix? if yes then possible the long timeout hided a real problem. >>>> >>>> >>> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp >>> switch) and verify that the fix makes the test pass. >>> >>> The ClientNotifForwarder scans the notifications for >>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>> appropriate notification listeners in a separate thread. Thus, calling >>> "removeNotificationListener" on the main thread is prone to racing. >>> >>> >> It is true that ClientNotifForwarder scans the notifications for >> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >> appropriate notification listeners in a separate thread. This is for a >> client connection to do clean if a user never calls >> removeNotificationListener. >> >> But calling directly removeNotificationListener from a client should >> still get exception if the clean has not been done. As I said, if the >> client checked and found the listener was still there, then the client >> sent a request to its server to remove the listener at server side, the >> server should find that the MBean in question was not registered, so the >> server should throw an exception. The bug might be here. >> > > This won't work. The server side listeners are removed upon receiving > the "unregistered" notification which is delivered from the > ClientNotificationForwarder and it may have not run yet (since it runs > in a separate executor thread). The result is that the attempt to remove > the notification listener on the server will succeed as well failing the > test subsequently. > > -JB- > > >> Shanliang >> >>> >>> >>>> The timeout you made longer was used to wait a notification which should >>>> never arrive. >>>> >>>> >>> Well, it can be used to allow more time to process the "unregister" >>> notification, too. >>> >>> When I think more of this I am more inclined to fix the race condition. >>> An updated webrev will follow. >>> >>> >>> >>>> To remove a listener from a client side, we did: >>>> 1) at client side, check whether it was added in the client side >>>> 2) at server side, check whether the MBean in question was registered in >>>> the MBeanServer (!!!) >>>> 3) at server side, check whether the listener was added. >>>> >>>> So 2) tells that we did not rely on a "unregister" notification. Anyway, >>>> if you use a SAME thread to call "unregister" operation to unregister an >>>> mbean, then any following call (without any time break) to use the mbean >>>> should fail, like "removeNotificationListener", "isRegistered" etc. >>>> >>>> I do see a bug here, if we remove a listener from a non-existing MBeam, >>>> we get "ListenerNotFoundException" at a client side, but get >>>> "InstanceNotFoundException" at server side, I think we should create a >>>> bug, because both implemented the same interface MBeanServerConnection. >>>> >>>> >>> Yes, it is rather inconsistent. >>> >>> -JB- >>> >>> >>> >>>> Shanliang >>>> >>>> Jaroslav Bachorik wrote: >>>> >>>> >>>>> Looking for review and a sponsor. >>>>> >>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>> >>>>> In this issue the timing is the problem. MBeanServer.unregisterMBean() >>>>> fires the "unregister" notification which is sent to the server >>>>> asynchronously. Thus it may happen that the "unregister" notification >>>>> has not been yet processed at the time of invoking >>>>> removeNotificationListener() and the notification listeners hasn't been >>>>> cleaned up leading to the test failure. >>>>> >>>>> There is no synchronization between the client and the server and such >>>>> race condition can occur occasionally. Normally, the execution is fast >>>>> enough to behave like the "unregister" notification is processed >>>>> synchronously with the unregisterMBean() operation but it seems that >>>>> using fastdebug Server VM bits with the -Xcomp option strains the CPU >>>>> enough to make this problem appear. >>>>> >>>>> There is no proper fix for this - the only thing that work is waiting a >>>>> bit longer in the main thread to give the notification processing >>>>> thread >>>>> some time to clean up the listeners. >>>>> >>>>> Regards, >>>>> >>>>> -JB- >>>>> >>>>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130109/77bcd5e4/attachment-0001.html From jaroslav.bachorik at oracle.com Wed Jan 9 06:00:33 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 09 Jan 2013 15:00:33 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50ED7436.1020205@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> Message-ID: <50ED7801.8080704@oracle.com> On 01/09/2013 02:44 PM, shanliang wrote: > Let's forget the JMX implementation at first. If an MBean is > unregistered, a user at client side calls "removeNotificationListener" > on the MBean, what should happen? if the user calls "isRegistered" on > the MBean, what should happen? > > I have done 2 tests, I used only one thread: > > 1) > ...... > localServer.unregisterMBean(myMBean); > boolean isRegistered = remoteClientServer.isRegistered(myMBean)); > > I got isRegistered = false; > > 2) > ...... > localServer.unregisterMBean(myMBean); > System.out.println("isRegistered = > "+remoteClientServer.sRegistered(myMBean)); > remoteClientServer.removeNotificationListener(myMBean, listener); > > I did not get an exception. > > The 1) told that the client could know the MBean was unregistered, then > the client should throw an exception for the call of > "removeNotificationListener" in 2). Yes, but then it would not test the listener leakage as it was supposed to test but rather the fact that the client throws the appropriate exception. The fact that the mbean was unregistered does not necessarily mean that the listeners were released. The main problem remains - the listeners are being cleaned-up asynchronously and the clean-up process might race against the other uses of the JMX API. > > The test "DeadListenerTest" got passed in some machines because of the > timeout for waiting a notification. I think its failure just tells a new > bug. > > To set a longer timeout just hides the real bug, and the test might fail > again one day if running condition is changed and you might need longer > timeout again. Yes, I agree with you that extending the timeout just lessens the likelihood of the race condition and does not prevent it. > > Shanliang > > Jaroslav Bachorik wrote: >> On 01/09/2013 11:08 AM, shanliang wrote: >> >>> Jaroslav Bachorik wrote: >>> >>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>> >>>> >>>>> I still have no idea why the test failed, but I do not see why a >>>>> longer >>>>> timeout can fix the test. Have you reproduced the problem and tested >>>>> your fix? if yes then possible the long timeout hided a real problem. >>>>> >>>> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp >>>> switch) and verify that the fix makes the test pass. >>>> >>>> The ClientNotifForwarder scans the notifications for >>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>> appropriate notification listeners in a separate thread. Thus, calling >>>> "removeNotificationListener" on the main thread is prone to racing. >>>> >>> It is true that ClientNotifForwarder scans the notifications for >>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>> appropriate notification listeners in a separate thread. This is for a >>> client connection to do clean if a user never calls >>> removeNotificationListener. >>> >>> But calling directly removeNotificationListener from a client should >>> still get exception if the clean has not been done. As I said, if the >>> client checked and found the listener was still there, then the client >>> sent a request to its server to remove the listener at server side, the >>> server should find that the MBean in question was not registered, so the >>> server should throw an exception. The bug might be here. >>> >> >> This won't work. The server side listeners are removed upon receiving >> the "unregistered" notification which is delivered from the >> ClientNotificationForwarder and it may have not run yet (since it runs >> in a separate executor thread). The result is that the attempt to remove >> the notification listener on the server will succeed as well failing the >> test subsequently. >> >> -JB- >> >> >>> Shanliang >>> >>>> >>>> >>>>> The timeout you made longer was used to wait a notification which >>>>> should >>>>> never arrive. >>>>> >>>> Well, it can be used to allow more time to process the "unregister" >>>> notification, too. >>>> >>>> When I think more of this I am more inclined to fix the race condition. >>>> An updated webrev will follow. >>>> >>>> >>>> >>>>> To remove a listener from a client side, we did: >>>>> 1) at client side, check whether it was added in the client side >>>>> 2) at server side, check whether the MBean in question was >>>>> registered in >>>>> the MBeanServer (!!!) >>>>> 3) at server side, check whether the listener was added. >>>>> >>>>> So 2) tells that we did not rely on a "unregister" notification. >>>>> Anyway, >>>>> if you use a SAME thread to call "unregister" operation to >>>>> unregister an >>>>> mbean, then any following call (without any time break) to use the >>>>> mbean >>>>> should fail, like "removeNotificationListener", "isRegistered" etc. >>>>> >>>>> I do see a bug here, if we remove a listener from a non-existing >>>>> MBeam, >>>>> we get "ListenerNotFoundException" at a client side, but get >>>>> "InstanceNotFoundException" at server side, I think we should create a >>>>> bug, because both implemented the same interface >>>>> MBeanServerConnection. >>>>> >>>> Yes, it is rather inconsistent. >>>> >>>> -JB- >>>> >>>> >>>> >>>>> Shanliang >>>>> >>>>> Jaroslav Bachorik wrote: >>>>> >>>>>> Looking for review and a sponsor. >>>>>> >>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>> >>>>>> In this issue the timing is the problem. >>>>>> MBeanServer.unregisterMBean() >>>>>> fires the "unregister" notification which is sent to the server >>>>>> asynchronously. Thus it may happen that the "unregister" notification >>>>>> has not been yet processed at the time of invoking >>>>>> removeNotificationListener() and the notification listeners hasn't >>>>>> been >>>>>> cleaned up leading to the test failure. >>>>>> >>>>>> There is no synchronization between the client and the server and >>>>>> such >>>>>> race condition can occur occasionally. Normally, the execution is >>>>>> fast >>>>>> enough to behave like the "unregister" notification is processed >>>>>> synchronously with the unregisterMBean() operation but it seems that >>>>>> using fastdebug Server VM bits with the -Xcomp option strains the CPU >>>>>> enough to make this problem appear. >>>>>> >>>>>> There is no proper fix for this - the only thing that work is >>>>>> waiting a >>>>>> bit longer in the main thread to give the notification processing >>>>>> thread >>>>>> some time to clean up the listeners. >>>>>> >>>>>> Regards, >>>>>> >>>>>> -JB- >>>>>> >>>> >>> >> >> > > From shanliang.jiang at oracle.com Wed Jan 9 06:25:51 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Wed, 09 Jan 2013 15:25:51 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50ED7801.8080704@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> <50ED7801.8080704@oracle.com> Message-ID: <50ED7DEF.9020108@oracle.com> Jaroslav Bachorik wrote: > On 01/09/2013 02:44 PM, shanliang wrote: > >> Let's forget the JMX implementation at first. If an MBean is >> unregistered, a user at client side calls "removeNotificationListener" >> on the MBean, what should happen? if the user calls "isRegistered" on >> the MBean, what should happen? >> >> I have done 2 tests, I used only one thread: >> >> 1) >> ...... >> localServer.unregisterMBean(myMBean); >> boolean isRegistered = remoteClientServer.isRegistered(myMBean)); >> >> I got isRegistered = false; >> >> 2) >> ...... >> localServer.unregisterMBean(myMBean); >> System.out.println("isRegistered = >> "+remoteClientServer.sRegistered(myMBean)); >> remoteClientServer.removeNotificationListener(myMBean, listener); >> >> I did not get an exception. >> >> The 1) told that the client could know the MBean was unregistered, then >> the client should throw an exception for the call of >> "removeNotificationListener" in 2). >> > > Yes, but then it would not test the listener leakage as it was supposed > to test but rather the fact that the client throws the appropriate > exception. The fact that the mbean was unregistered does not necessarily > mean that the listeners were released. The main problem remains - the > listeners are being cleaned-up asynchronously and the clean-up process > might race against the other uses of the JMX API. > client.removeNotificationListener is not a right way here to test listener leak, we could use some other ways, for example we keep the listener in a weak reference, then after the mbean is removed, the weak reference should be empty after some time. Another way is like DeadListenerTest does to check whether clean has done at server side: use reflection to get the "listenerMap" at server side and make sure it is empty, but this need to add a private method to the class ClientNotifForwarder. I think we have 3 things to do here: 1) modify the test to not use removeNotificationListener for testing listener leak 2) create a new bug about a client does not throw an exception after an mbean is unregistered 3) create a bug about a client does not throw a same exception as at server side. I will do 2) and 3), if you like you can continue 1), it might need to do fix also in the JMX implementation. Shanliang > >> The test "DeadListenerTest" got passed in some machines because of the >> timeout for waiting a notification. I think its failure just tells a new >> bug. >> >> To set a longer timeout just hides the real bug, and the test might fail >> again one day if running condition is changed and you might need longer >> timeout again. >> > > Yes, I agree with you that extending the timeout just lessens the > likelihood of the race condition and does not prevent it. > > >> Shanliang >> >> Jaroslav Bachorik wrote: >> >>> On 01/09/2013 11:08 AM, shanliang wrote: >>> >>> >>>> Jaroslav Bachorik wrote: >>>> >>>> >>>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>>> >>>>> >>>>> >>>>>> I still have no idea why the test failed, but I do not see why a >>>>>> longer >>>>>> timeout can fix the test. Have you reproduced the problem and tested >>>>>> your fix? if yes then possible the long timeout hided a real problem. >>>>>> >>>>>> >>>>> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp >>>>> switch) and verify that the fix makes the test pass. >>>>> >>>>> The ClientNotifForwarder scans the notifications for >>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>>> appropriate notification listeners in a separate thread. Thus, calling >>>>> "removeNotificationListener" on the main thread is prone to racing. >>>>> >>>>> >>>> It is true that ClientNotifForwarder scans the notifications for >>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>> appropriate notification listeners in a separate thread. This is for a >>>> client connection to do clean if a user never calls >>>> removeNotificationListener. >>>> >>>> But calling directly removeNotificationListener from a client should >>>> still get exception if the clean has not been done. As I said, if the >>>> client checked and found the listener was still there, then the client >>>> sent a request to its server to remove the listener at server side, the >>>> server should find that the MBean in question was not registered, so the >>>> server should throw an exception. The bug might be here. >>>> >>>> >>> This won't work. The server side listeners are removed upon receiving >>> the "unregistered" notification which is delivered from the >>> ClientNotificationForwarder and it may have not run yet (since it runs >>> in a separate executor thread). The result is that the attempt to remove >>> the notification listener on the server will succeed as well failing the >>> test subsequently. >>> >>> -JB- >>> >>> >>> >>>> Shanliang >>>> >>>> >>>>> >>>>> >>>>> >>>>>> The timeout you made longer was used to wait a notification which >>>>>> should >>>>>> never arrive. >>>>>> >>>>>> >>>>> Well, it can be used to allow more time to process the "unregister" >>>>> notification, too. >>>>> >>>>> When I think more of this I am more inclined to fix the race condition. >>>>> An updated webrev will follow. >>>>> >>>>> >>>>> >>>>> >>>>>> To remove a listener from a client side, we did: >>>>>> 1) at client side, check whether it was added in the client side >>>>>> 2) at server side, check whether the MBean in question was >>>>>> registered in >>>>>> the MBeanServer (!!!) >>>>>> 3) at server side, check whether the listener was added. >>>>>> >>>>>> So 2) tells that we did not rely on a "unregister" notification. >>>>>> Anyway, >>>>>> if you use a SAME thread to call "unregister" operation to >>>>>> unregister an >>>>>> mbean, then any following call (without any time break) to use the >>>>>> mbean >>>>>> should fail, like "removeNotificationListener", "isRegistered" etc. >>>>>> >>>>>> I do see a bug here, if we remove a listener from a non-existing >>>>>> MBeam, >>>>>> we get "ListenerNotFoundException" at a client side, but get >>>>>> "InstanceNotFoundException" at server side, I think we should create a >>>>>> bug, because both implemented the same interface >>>>>> MBeanServerConnection. >>>>>> >>>>>> >>>>> Yes, it is rather inconsistent. >>>>> >>>>> -JB- >>>>> >>>>> >>>>> >>>>> >>>>>> Shanliang >>>>>> >>>>>> Jaroslav Bachorik wrote: >>>>>> >>>>>> >>>>>>> Looking for review and a sponsor. >>>>>>> >>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>>> >>>>>>> In this issue the timing is the problem. >>>>>>> MBeanServer.unregisterMBean() >>>>>>> fires the "unregister" notification which is sent to the server >>>>>>> asynchronously. Thus it may happen that the "unregister" notification >>>>>>> has not been yet processed at the time of invoking >>>>>>> removeNotificationListener() and the notification listeners hasn't >>>>>>> been >>>>>>> cleaned up leading to the test failure. >>>>>>> >>>>>>> There is no synchronization between the client and the server and >>>>>>> such >>>>>>> race condition can occur occasionally. Normally, the execution is >>>>>>> fast >>>>>>> enough to behave like the "unregister" notification is processed >>>>>>> synchronously with the unregisterMBean() operation but it seems that >>>>>>> using fastdebug Server VM bits with the -Xcomp option strains the CPU >>>>>>> enough to make this problem appear. >>>>>>> >>>>>>> There is no proper fix for this - the only thing that work is >>>>>>> waiting a >>>>>>> bit longer in the main thread to give the notification processing >>>>>>> thread >>>>>>> some time to clean up the listeners. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> -JB- >>>>>>> >>>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130109/7e0c4d75/attachment.html From shanliang.jiang at oracle.com Wed Jan 9 06:32:12 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Wed, 09 Jan 2013 15:32:12 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50ED7DEF.9020108@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> <50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com> Message-ID: <50ED7F6C.4010301@oracle.com> shanliang wrote: > I think we have 3 things to do here: > 1) modify the test to not use removeNotificationListener for testing > listener leak > 2) create a new bug about a client does not throw an exception after > an mbean is unregistered > 3) create a bug about a client does not throw a same exception as at > server side. > > I will do 2) and 3), if you like you can continue 1), it might need to > do fix also in the JMX implementation. Oh, 1) does not need to do fix in JMX implementation, just fix the test. From stuart.marks at oracle.com Thu Jan 10 00:52:10 2013 From: stuart.marks at oracle.com (Stuart Marks) Date: Thu, 10 Jan 2013 00:52:10 -0800 Subject: jmx-dev [PATCH] JDK-8005472: com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh failed on windows In-Reply-To: <50EAB014.30805@oracle.com> References: <50E16BA8.40203@oracle.com> <682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com> <50EAB014.30805@oracle.com> Message-ID: <50EE813A.1020501@oracle.com> On 1/7/13 3:23 AM, Jaroslav Bachorik wrote: > On 01/04/2013 11:37 PM, Kelly O'Hair wrote: >> I suspect it is not hanging because it does not exist, but that some other windows process has it's hands on it. >> This is the stdout file from the server being started up right? >> Could the server from a previous test run be still running? > > Exactly. Amy confirmed this and provided a patch which resolves the > hanging problem. > > The update patch is at > http://cr.openjdk.java.net/~jbachorik/8005472/webrev.01 Hi Jaroslav, The change to remove the parentheses from around the server program looks right. It avoids forking an extra process (at least in some shells) and lets $! refer to the actual JVM, not an intermediate shell process. The rm -f from Kelly's suggestion is good too. But there are other things wrong with the script. I don't think they could cause hanging, but they could cause the script to fail in unforeseen ways, or even to report success incorrectly. One problem is introduced by the change, where the Server's stderr is also redirected into $URL_PATH along with stdout. This means that if the Server program reports any errors, they'll get mixed into the URL_PATH file instead of appearing in the test log. The URL_PATH file's contents is never reported, so these error messages will be invisible. The exit status of some of the critical commands (such as the compilations) isn't checked, so if javac fails for some reason, the test might not report failure. Instead, some weird error might or might not be reported later (though one will still see the javac errors in the log). I don't think the sleep at line 80 is necessary, since the client runs synchronously and should have exited by this point. The wait loop checking for the existence of the URL_PATH file doesn't actually guarantee that the server is running or has initialized yet. The file is actually created by the shell before the Server JVM starts up. Thus, runClient might try to read from it before the server has written anything to it. Or, as mentioned above, the server might have written some error messages into the URL_PATH file instead of the expected contents. Thus, the contents of the JMXURL variable can quite possibly be incorrect. If this occurs, what will happen when the client runs? It may emit some error message, and this will be filtered out by the grep pipeline. Thus, HAS_ERRORS might end up empty, and the test will report passing, even though everything has failed! For this changeset I'd recommend at a minimum removing the redirection of stderr to URL_PATH. If the server fails we'll at least see errors in the test log. For checking the notification message, is there a way to modify the client to report an exit status or throw an exception? Throwing an exception from main() will exit the JVM with a nonzero status, so this can be checked more easily from the script. I think this is less error-prone than grepping the output for a specific error message. The test should fail if there is *any* error; it should not succeed if an expected error is absent. You might consider having jtreg build the client and server classes. This might simplify some of the setup. Also, jtreg is meticulous about aborting the test if any compilations fail, so it takes care of that for you. It would be nice if there were a better way to have the client rendezvous with the server. I hate to suggest it, but sleeping unconditionally after starting the server is probably necessary. Anything more robust probably requires rearchitecting the test, though. Sorry to dump all this on you. But one of the shell-based RMI tests suffers from *exactly* the same pathologies. (I have yet to fix it.) Unfortunately, I believe that there are a lot of other shell-based tests in the test suite that have similar problems. The lesson here is that writing reliable shell tests is a lot harder than it seems. Thanks, s'marks From jaroslav.bachorik at oracle.com Thu Jan 10 00:56:42 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 10 Jan 2013 09:56:42 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50ED7DEF.9020108@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> <50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com> Message-ID: <50EE824A.8020106@oracle.com> On 01/09/2013 03:25 PM, shanliang wrote: > Jaroslav Bachorik wrote: >> On 01/09/2013 02:44 PM, shanliang wrote: >> >>> Let's forget the JMX implementation at first. If an MBean is >>> unregistered, a user at client side calls "removeNotificationListener" >>> on the MBean, what should happen? if the user calls "isRegistered" on >>> the MBean, what should happen? >>> >>> I have done 2 tests, I used only one thread: >>> >>> 1) >>> ...... >>> localServer.unregisterMBean(myMBean); >>> boolean isRegistered = remoteClientServer.isRegistered(myMBean)); >>> >>> I got isRegistered = false; >>> >>> 2) >>> ...... >>> localServer.unregisterMBean(myMBean); >>> System.out.println("isRegistered = >>> "+remoteClientServer.sRegistered(myMBean)); >>> remoteClientServer.removeNotificationListener(myMBean, listener); >>> >>> I did not get an exception. >>> >>> The 1) told that the client could know the MBean was unregistered, then >>> the client should throw an exception for the call of >>> "removeNotificationListener" in 2). >>> >> >> Yes, but then it would not test the listener leakage as it was supposed >> to test but rather the fact that the client throws the appropriate >> exception. The fact that the mbean was unregistered does not necessarily >> mean that the listeners were released. The main problem remains - the >> listeners are being cleaned-up asynchronously and the clean-up process >> might race against the other uses of the JMX API. >> > client.removeNotificationListener is not a right way here to test > listener leak, we could use some other ways, for example we keep the > listener in a weak reference, then after the mbean is removed, the weak > reference should be empty after some time. Another way is like > DeadListenerTest does to check whether clean has done at server side: > use reflection to get the "listenerMap" at server side and make sure it > is empty, but this need to add a private method to the class > ClientNotifForwarder. There will still be problems with timing. You need either to wait for the GC to kick in to clean up the weak ref. And the listenerMap will not be purged of the unregistered MBean listeners until the notification is generated, processed on the ClientNotificationForwarder and forwarded to the server. So there goes the timing issue again. The problem is that the "unregisterMBean" operation does not guarantee that the listeners have been unregistered at the time it returns. So, one way or the other we will need to wait an arbitrary amount of time before checking for the memory leak. -JB- > > I think we have 3 things to do here: > 1) modify the test to not use removeNotificationListener for testing > listener leak > 2) create a new bug about a client does not throw an exception after an > mbean is unregistered > 3) create a bug about a client does not throw a same exception as at > server side. > > I will do 2) and 3), if you like you can continue 1), it might need to > do fix also in the JMX implementation. > > Shanliang >> >>> The test "DeadListenerTest" got passed in some machines because of the >>> timeout for waiting a notification. I think its failure just tells a new >>> bug. >>> >>> To set a longer timeout just hides the real bug, and the test might fail >>> again one day if running condition is changed and you might need longer >>> timeout again. >>> >> >> Yes, I agree with you that extending the timeout just lessens the >> likelihood of the race condition and does not prevent it. >> >> >>> Shanliang >>> >>> Jaroslav Bachorik wrote: >>> >>>> On 01/09/2013 11:08 AM, shanliang wrote: >>>> >>>> >>>>> Jaroslav Bachorik wrote: >>>>> >>>>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>>>> >>>>>> >>>>>>> I still have no idea why the test failed, but I do not see why a >>>>>>> longer >>>>>>> timeout can fix the test. Have you reproduced the problem and tested >>>>>>> your fix? if yes then possible the long timeout hided a real >>>>>>> problem. >>>>>>> >>>>>> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp >>>>>> switch) and verify that the fix makes the test pass. >>>>>> >>>>>> The ClientNotifForwarder scans the notifications for >>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>>>> appropriate notification listeners in a separate thread. Thus, >>>>>> calling >>>>>> "removeNotificationListener" on the main thread is prone to racing. >>>>>> >>>>> It is true that ClientNotifForwarder scans the notifications for >>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>>> appropriate notification listeners in a separate thread. This is for a >>>>> client connection to do clean if a user never calls >>>>> removeNotificationListener. >>>>> >>>>> But calling directly removeNotificationListener from a client should >>>>> still get exception if the clean has not been done. As I said, if the >>>>> client checked and found the listener was still there, then the client >>>>> sent a request to its server to remove the listener at server side, >>>>> the >>>>> server should find that the MBean in question was not registered, >>>>> so the >>>>> server should throw an exception. The bug might be here. >>>>> >>>> This won't work. The server side listeners are removed upon receiving >>>> the "unregistered" notification which is delivered from the >>>> ClientNotificationForwarder and it may have not run yet (since it runs >>>> in a separate executor thread). The result is that the attempt to >>>> remove >>>> the notification listener on the server will succeed as well failing >>>> the >>>> test subsequently. >>>> >>>> -JB- >>>> >>>> >>>> >>>>> Shanliang >>>>> >>>>>> >>>>>> >>>>>>> The timeout you made longer was used to wait a notification which >>>>>>> should >>>>>>> never arrive. >>>>>>> >>>>>> Well, it can be used to allow more time to process the "unregister" >>>>>> notification, too. >>>>>> >>>>>> When I think more of this I am more inclined to fix the race >>>>>> condition. >>>>>> An updated webrev will follow. >>>>>> >>>>>> >>>>>> >>>>>>> To remove a listener from a client side, we did: >>>>>>> 1) at client side, check whether it was added in the client side >>>>>>> 2) at server side, check whether the MBean in question was >>>>>>> registered in >>>>>>> the MBeanServer (!!!) >>>>>>> 3) at server side, check whether the listener was added. >>>>>>> >>>>>>> So 2) tells that we did not rely on a "unregister" notification. >>>>>>> Anyway, >>>>>>> if you use a SAME thread to call "unregister" operation to >>>>>>> unregister an >>>>>>> mbean, then any following call (without any time break) to use the >>>>>>> mbean >>>>>>> should fail, like "removeNotificationListener", "isRegistered" etc. >>>>>>> >>>>>>> I do see a bug here, if we remove a listener from a non-existing >>>>>>> MBeam, >>>>>>> we get "ListenerNotFoundException" at a client side, but get >>>>>>> "InstanceNotFoundException" at server side, I think we should >>>>>>> create a >>>>>>> bug, because both implemented the same interface >>>>>>> MBeanServerConnection. >>>>>>> >>>>>> Yes, it is rather inconsistent. >>>>>> >>>>>> -JB- >>>>>> >>>>>> >>>>>> >>>>>>> Shanliang >>>>>>> >>>>>>> Jaroslav Bachorik wrote: >>>>>>> >>>>>>>> Looking for review and a sponsor. >>>>>>>> >>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>>>> >>>>>>>> In this issue the timing is the problem. >>>>>>>> MBeanServer.unregisterMBean() >>>>>>>> fires the "unregister" notification which is sent to the server >>>>>>>> asynchronously. Thus it may happen that the "unregister" >>>>>>>> notification >>>>>>>> has not been yet processed at the time of invoking >>>>>>>> removeNotificationListener() and the notification listeners hasn't >>>>>>>> been >>>>>>>> cleaned up leading to the test failure. >>>>>>>> >>>>>>>> There is no synchronization between the client and the server and >>>>>>>> such >>>>>>>> race condition can occur occasionally. Normally, the execution is >>>>>>>> fast >>>>>>>> enough to behave like the "unregister" notification is processed >>>>>>>> synchronously with the unregisterMBean() operation but it seems >>>>>>>> that >>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains >>>>>>>> the CPU >>>>>>>> enough to make this problem appear. >>>>>>>> >>>>>>>> There is no proper fix for this - the only thing that work is >>>>>>>> waiting a >>>>>>>> bit longer in the main thread to give the notification processing >>>>>>>> thread >>>>>>>> some time to clean up the listeners. >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>> >>>>> >>>> >>> >> >> > > From shanliang.jiang at oracle.com Thu Jan 10 01:05:11 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Thu, 10 Jan 2013 10:05:11 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50EE824A.8020106@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> <50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com> <50EE824A.8020106@oracle.com> Message-ID: <50EE8447.50901@oracle.com> Jaroslav Bachorik wrote: > On 01/09/2013 03:25 PM, shanliang wrote: > >> Jaroslav Bachorik wrote: >> >>> On 01/09/2013 02:44 PM, shanliang wrote: >>> >>> >>>> Let's forget the JMX implementation at first. If an MBean is >>>> unregistered, a user at client side calls "removeNotificationListener" >>>> on the MBean, what should happen? if the user calls "isRegistered" on >>>> the MBean, what should happen? >>>> >>>> I have done 2 tests, I used only one thread: >>>> >>>> 1) >>>> ...... >>>> localServer.unregisterMBean(myMBean); >>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean)); >>>> >>>> I got isRegistered = false; >>>> >>>> 2) >>>> ...... >>>> localServer.unregisterMBean(myMBean); >>>> System.out.println("isRegistered = >>>> "+remoteClientServer.sRegistered(myMBean)); >>>> remoteClientServer.removeNotificationListener(myMBean, listener); >>>> >>>> I did not get an exception. >>>> >>>> The 1) told that the client could know the MBean was unregistered, then >>>> the client should throw an exception for the call of >>>> "removeNotificationListener" in 2). >>>> >>>> >>> Yes, but then it would not test the listener leakage as it was supposed >>> to test but rather the fact that the client throws the appropriate >>> exception. The fact that the mbean was unregistered does not necessarily >>> mean that the listeners were released. The main problem remains - the >>> listeners are being cleaned-up asynchronously and the clean-up process >>> might race against the other uses of the JMX API. >>> >>> >> client.removeNotificationListener is not a right way here to test >> listener leak, we could use some other ways, for example we keep the >> listener in a weak reference, then after the mbean is removed, the weak >> reference should be empty after some time. Another way is like >> DeadListenerTest does to check whether clean has done at server side: >> use reflection to get the "listenerMap" at server side and make sure it >> is empty, but this need to add a private method to the class >> ClientNotifForwarder. >> > > There will still be problems with timing. You need either to wait for > the GC to kick in to clean up the weak ref. And the listenerMap will not > be purged of the unregistered MBean listeners until the notification is > generated, processed on the ClientNotificationForwarder and forwarded to > the server. So there goes the timing issue again. > > The problem is that the "unregisterMBean" operation does not guarantee > that the listeners have been unregistered at the time it returns. So, > one way or the other we will need to wait an arbitrary amount of time > before checking for the memory leak. > Yes we need to wait, but you can use a cycle like: long maxWaitingTime = 3000; long startTime = System.currentTimeMillis(); while ( weakReference.get != null && System.currentTimeMillis() < startTime + maxWaitingTime) { System.gc(); Thread.sleep(100); System.gc(); } if (weakReference.get != null) { // failed } Shanliang > -JB- > > >> I think we have 3 things to do here: >> 1) modify the test to not use removeNotificationListener for testing >> listener leak >> 2) create a new bug about a client does not throw an exception after an >> mbean is unregistered >> 3) create a bug about a client does not throw a same exception as at >> server side. >> >> I will do 2) and 3), if you like you can continue 1), it might need to >> do fix also in the JMX implementation. >> >> Shanliang >> >>> >>> >>>> The test "DeadListenerTest" got passed in some machines because of the >>>> timeout for waiting a notification. I think its failure just tells a new >>>> bug. >>>> >>>> To set a longer timeout just hides the real bug, and the test might fail >>>> again one day if running condition is changed and you might need longer >>>> timeout again. >>>> >>>> >>> Yes, I agree with you that extending the timeout just lessens the >>> likelihood of the race condition and does not prevent it. >>> >>> >>> >>>> Shanliang >>>> >>>> Jaroslav Bachorik wrote: >>>> >>>> >>>>> On 01/09/2013 11:08 AM, shanliang wrote: >>>>> >>>>> >>>>> >>>>>> Jaroslav Bachorik wrote: >>>>>> >>>>>> >>>>>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I still have no idea why the test failed, but I do not see why a >>>>>>>> longer >>>>>>>> timeout can fix the test. Have you reproduced the problem and tested >>>>>>>> your fix? if yes then possible the long timeout hided a real >>>>>>>> problem. >>>>>>>> >>>>>>>> >>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp >>>>>>> switch) and verify that the fix makes the test pass. >>>>>>> >>>>>>> The ClientNotifForwarder scans the notifications for >>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>>>>> appropriate notification listeners in a separate thread. Thus, >>>>>>> calling >>>>>>> "removeNotificationListener" on the main thread is prone to racing. >>>>>>> >>>>>>> >>>>>> It is true that ClientNotifForwarder scans the notifications for >>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>>>> appropriate notification listeners in a separate thread. This is for a >>>>>> client connection to do clean if a user never calls >>>>>> removeNotificationListener. >>>>>> >>>>>> But calling directly removeNotificationListener from a client should >>>>>> still get exception if the clean has not been done. As I said, if the >>>>>> client checked and found the listener was still there, then the client >>>>>> sent a request to its server to remove the listener at server side, >>>>>> the >>>>>> server should find that the MBean in question was not registered, >>>>>> so the >>>>>> server should throw an exception. The bug might be here. >>>>>> >>>>>> >>>>> This won't work. The server side listeners are removed upon receiving >>>>> the "unregistered" notification which is delivered from the >>>>> ClientNotificationForwarder and it may have not run yet (since it runs >>>>> in a separate executor thread). The result is that the attempt to >>>>> remove >>>>> the notification listener on the server will succeed as well failing >>>>> the >>>>> test subsequently. >>>>> >>>>> -JB- >>>>> >>>>> >>>>> >>>>> >>>>>> Shanliang >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> The timeout you made longer was used to wait a notification which >>>>>>>> should >>>>>>>> never arrive. >>>>>>>> >>>>>>>> >>>>>>> Well, it can be used to allow more time to process the "unregister" >>>>>>> notification, too. >>>>>>> >>>>>>> When I think more of this I am more inclined to fix the race >>>>>>> condition. >>>>>>> An updated webrev will follow. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> To remove a listener from a client side, we did: >>>>>>>> 1) at client side, check whether it was added in the client side >>>>>>>> 2) at server side, check whether the MBean in question was >>>>>>>> registered in >>>>>>>> the MBeanServer (!!!) >>>>>>>> 3) at server side, check whether the listener was added. >>>>>>>> >>>>>>>> So 2) tells that we did not rely on a "unregister" notification. >>>>>>>> Anyway, >>>>>>>> if you use a SAME thread to call "unregister" operation to >>>>>>>> unregister an >>>>>>>> mbean, then any following call (without any time break) to use the >>>>>>>> mbean >>>>>>>> should fail, like "removeNotificationListener", "isRegistered" etc. >>>>>>>> >>>>>>>> I do see a bug here, if we remove a listener from a non-existing >>>>>>>> MBeam, >>>>>>>> we get "ListenerNotFoundException" at a client side, but get >>>>>>>> "InstanceNotFoundException" at server side, I think we should >>>>>>>> create a >>>>>>>> bug, because both implemented the same interface >>>>>>>> MBeanServerConnection. >>>>>>>> >>>>>>>> >>>>>>> Yes, it is rather inconsistent. >>>>>>> >>>>>>> -JB- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Shanliang >>>>>>>> >>>>>>>> Jaroslav Bachorik wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Looking for review and a sponsor. >>>>>>>>> >>>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>>>>> >>>>>>>>> In this issue the timing is the problem. >>>>>>>>> MBeanServer.unregisterMBean() >>>>>>>>> fires the "unregister" notification which is sent to the server >>>>>>>>> asynchronously. Thus it may happen that the "unregister" >>>>>>>>> notification >>>>>>>>> has not been yet processed at the time of invoking >>>>>>>>> removeNotificationListener() and the notification listeners hasn't >>>>>>>>> been >>>>>>>>> cleaned up leading to the test failure. >>>>>>>>> >>>>>>>>> There is no synchronization between the client and the server and >>>>>>>>> such >>>>>>>>> race condition can occur occasionally. Normally, the execution is >>>>>>>>> fast >>>>>>>>> enough to behave like the "unregister" notification is processed >>>>>>>>> synchronously with the unregisterMBean() operation but it seems >>>>>>>>> that >>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains >>>>>>>>> the CPU >>>>>>>>> enough to make this problem appear. >>>>>>>>> >>>>>>>>> There is no proper fix for this - the only thing that work is >>>>>>>>> waiting a >>>>>>>>> bit longer in the main thread to give the notification processing >>>>>>>>> thread >>>>>>>>> some time to clean up the listeners. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130110/c9203ef0/attachment-0001.html From stuart.marks at oracle.com Thu Jan 10 01:15:42 2013 From: stuart.marks at oracle.com (Stuart Marks) Date: Thu, 10 Jan 2013 01:15:42 -0800 Subject: jmx-dev [PATCH] JDK-8005472: com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh failed on windows In-Reply-To: <50EE813A.1020501@oracle.com> References: <50E16BA8.40203@oracle.com> <682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com> <50EAB014.30805@oracle.com> <50EE813A.1020501@oracle.com> Message-ID: <50EE86BE.5090101@oracle.com> On 1/10/13 12:52 AM, Stuart Marks wrote: > The exit status of some of the critical commands (such as the compilations) > isn't checked, so if javac fails for some reason, the test might not report > failure. Instead, some weird error might or might not be reported later (though > one will still see the javac errors in the log). Adding set -e near the top of the script will enable a feature where the script will exit if any command gives a nonzero exit status. This avoids having to do a lot of tedious error checking of commands that just "do stuff" (like mkdir, rm, javac) but beware, some commands give a non-zero exit status somewhat unexpectedly, like grep. s'marks From jaroslav.bachorik at oracle.com Thu Jan 10 01:34:36 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 10 Jan 2013 10:34:36 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50EE8447.50901@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> <50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com> <50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com> Message-ID: <50EE8B2C.9030900@oracle.com> On 01/10/2013 10:05 AM, shanliang wrote: > Jaroslav Bachorik wrote: >> On 01/09/2013 03:25 PM, shanliang wrote: >> >>> Jaroslav Bachorik wrote: >>> >>>> On 01/09/2013 02:44 PM, shanliang wrote: >>>> >>>> >>>>> Let's forget the JMX implementation at first. If an MBean is >>>>> unregistered, a user at client side calls "removeNotificationListener" >>>>> on the MBean, what should happen? if the user calls "isRegistered" on >>>>> the MBean, what should happen? >>>>> >>>>> I have done 2 tests, I used only one thread: >>>>> >>>>> 1) >>>>> ...... >>>>> localServer.unregisterMBean(myMBean); >>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean)); >>>>> >>>>> I got isRegistered = false; >>>>> >>>>> 2) >>>>> ...... >>>>> localServer.unregisterMBean(myMBean); >>>>> System.out.println("isRegistered = >>>>> "+remoteClientServer.sRegistered(myMBean)); >>>>> remoteClientServer.removeNotificationListener(myMBean, listener); >>>>> >>>>> I did not get an exception. >>>>> >>>>> The 1) told that the client could know the MBean was unregistered, >>>>> then >>>>> the client should throw an exception for the call of >>>>> "removeNotificationListener" in 2). >>>>> >>>> Yes, but then it would not test the listener leakage as it was supposed >>>> to test but rather the fact that the client throws the appropriate >>>> exception. The fact that the mbean was unregistered does not >>>> necessarily >>>> mean that the listeners were released. The main problem remains - the >>>> listeners are being cleaned-up asynchronously and the clean-up process >>>> might race against the other uses of the JMX API. >>>> >>> client.removeNotificationListener is not a right way here to test >>> listener leak, we could use some other ways, for example we keep the >>> listener in a weak reference, then after the mbean is removed, the weak >>> reference should be empty after some time. Another way is like >>> DeadListenerTest does to check whether clean has done at server side: >>> use reflection to get the "listenerMap" at server side and make sure it >>> is empty, but this need to add a private method to the class >>> ClientNotifForwarder. >>> >> >> There will still be problems with timing. You need either to wait for >> the GC to kick in to clean up the weak ref. And the listenerMap will not >> be purged of the unregistered MBean listeners until the notification is >> generated, processed on the ClientNotificationForwarder and forwarded to >> the server. So there goes the timing issue again. >> >> The problem is that the "unregisterMBean" operation does not guarantee >> that the listeners have been unregistered at the time it returns. So, >> one way or the other we will need to wait an arbitrary amount of time >> before checking for the memory leak. >> > Yes we need to wait, but you can use a cycle like: > long maxWaitingTime = 3000; > long startTime = System.currentTimeMillis(); > while ( weakReference.get != null > && System.currentTimeMillis() < startTime + > maxWaitingTime) { > System.gc(); > Thread.sleep(100); > System.gc(); > } > > if (weakReference.get != null) { > // failed > } > Sounds reasonable. I'll update the test. -JB- > Shanliang >> -JB- >> >> >>> I think we have 3 things to do here: >>> 1) modify the test to not use removeNotificationListener for testing >>> listener leak >>> 2) create a new bug about a client does not throw an exception after an >>> mbean is unregistered >>> 3) create a bug about a client does not throw a same exception as at >>> server side. >>> >>> I will do 2) and 3), if you like you can continue 1), it might need to >>> do fix also in the JMX implementation. >>> >>> Shanliang >>> >>>> >>>> >>>>> The test "DeadListenerTest" got passed in some machines because of the >>>>> timeout for waiting a notification. I think its failure just tells >>>>> a new >>>>> bug. >>>>> >>>>> To set a longer timeout just hides the real bug, and the test might >>>>> fail >>>>> again one day if running condition is changed and you might need >>>>> longer >>>>> timeout again. >>>>> >>>> Yes, I agree with you that extending the timeout just lessens the >>>> likelihood of the race condition and does not prevent it. >>>> >>>> >>>> >>>>> Shanliang >>>>> >>>>> Jaroslav Bachorik wrote: >>>>> >>>>>> On 01/09/2013 11:08 AM, shanliang wrote: >>>>>> >>>>>> >>>>>>> Jaroslav Bachorik wrote: >>>>>>> >>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>>>>>> >>>>>>>> >>>>>>>>> I still have no idea why the test failed, but I do not see why a >>>>>>>>> longer >>>>>>>>> timeout can fix the test. Have you reproduced the problem and >>>>>>>>> tested >>>>>>>>> your fix? if yes then possible the long timeout hided a real >>>>>>>>> problem. >>>>>>>>> >>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and >>>>>>>> -Xcomp >>>>>>>> switch) and verify that the fix makes the test pass. >>>>>>>> >>>>>>>> The ClientNotifForwarder scans the notifications for >>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>>>>>> appropriate notification listeners in a separate thread. Thus, >>>>>>>> calling >>>>>>>> "removeNotificationListener" on the main thread is prone to racing. >>>>>>>> >>>>>>> It is true that ClientNotifForwarder scans the notifications for >>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>>>>> appropriate notification listeners in a separate thread. This is >>>>>>> for a >>>>>>> client connection to do clean if a user never calls >>>>>>> removeNotificationListener. >>>>>>> >>>>>>> But calling directly removeNotificationListener from a client should >>>>>>> still get exception if the clean has not been done. As I said, if >>>>>>> the >>>>>>> client checked and found the listener was still there, then the >>>>>>> client >>>>>>> sent a request to its server to remove the listener at server side, >>>>>>> the >>>>>>> server should find that the MBean in question was not registered, >>>>>>> so the >>>>>>> server should throw an exception. The bug might be here. >>>>>>> >>>>>> This won't work. The server side listeners are removed upon receiving >>>>>> the "unregistered" notification which is delivered from the >>>>>> ClientNotificationForwarder and it may have not run yet (since it >>>>>> runs >>>>>> in a separate executor thread). The result is that the attempt to >>>>>> remove >>>>>> the notification listener on the server will succeed as well failing >>>>>> the >>>>>> test subsequently. >>>>>> >>>>>> -JB- >>>>>> >>>>>> >>>>>> >>>>>>> Shanliang >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> The timeout you made longer was used to wait a notification which >>>>>>>>> should >>>>>>>>> never arrive. >>>>>>>>> >>>>>>>> Well, it can be used to allow more time to process the "unregister" >>>>>>>> notification, too. >>>>>>>> >>>>>>>> When I think more of this I am more inclined to fix the race >>>>>>>> condition. >>>>>>>> An updated webrev will follow. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> To remove a listener from a client side, we did: >>>>>>>>> 1) at client side, check whether it was added in the client side >>>>>>>>> 2) at server side, check whether the MBean in question was >>>>>>>>> registered in >>>>>>>>> the MBeanServer (!!!) >>>>>>>>> 3) at server side, check whether the listener was added. >>>>>>>>> >>>>>>>>> So 2) tells that we did not rely on a "unregister" notification. >>>>>>>>> Anyway, >>>>>>>>> if you use a SAME thread to call "unregister" operation to >>>>>>>>> unregister an >>>>>>>>> mbean, then any following call (without any time break) to use the >>>>>>>>> mbean >>>>>>>>> should fail, like "removeNotificationListener", "isRegistered" >>>>>>>>> etc. >>>>>>>>> >>>>>>>>> I do see a bug here, if we remove a listener from a non-existing >>>>>>>>> MBeam, >>>>>>>>> we get "ListenerNotFoundException" at a client side, but get >>>>>>>>> "InstanceNotFoundException" at server side, I think we should >>>>>>>>> create a >>>>>>>>> bug, because both implemented the same interface >>>>>>>>> MBeanServerConnection. >>>>>>>>> >>>>>>>> Yes, it is rather inconsistent. >>>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Shanliang >>>>>>>>> >>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>> >>>>>>>>>> Looking for review and a sponsor. >>>>>>>>>> >>>>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>>>>>> >>>>>>>>>> In this issue the timing is the problem. >>>>>>>>>> MBeanServer.unregisterMBean() >>>>>>>>>> fires the "unregister" notification which is sent to the server >>>>>>>>>> asynchronously. Thus it may happen that the "unregister" >>>>>>>>>> notification >>>>>>>>>> has not been yet processed at the time of invoking >>>>>>>>>> removeNotificationListener() and the notification listeners >>>>>>>>>> hasn't >>>>>>>>>> been >>>>>>>>>> cleaned up leading to the test failure. >>>>>>>>>> >>>>>>>>>> There is no synchronization between the client and the server and >>>>>>>>>> such >>>>>>>>>> race condition can occur occasionally. Normally, the execution is >>>>>>>>>> fast >>>>>>>>>> enough to behave like the "unregister" notification is processed >>>>>>>>>> synchronously with the unregisterMBean() operation but it seems >>>>>>>>>> that >>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains >>>>>>>>>> the CPU >>>>>>>>>> enough to make this problem appear. >>>>>>>>>> >>>>>>>>>> There is no proper fix for this - the only thing that work is >>>>>>>>>> waiting a >>>>>>>>>> bit longer in the main thread to give the notification processing >>>>>>>>>> thread >>>>>>>>>> some time to clean up the listeners. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> -JB- >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> > > From jaroslav.bachorik at oracle.com Thu Jan 10 03:41:44 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 10 Jan 2013 12:41:44 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50EE8447.50901@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> <50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com> <50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com> Message-ID: <50EEA8F8.7090007@oracle.com> On 01/10/2013 10:05 AM, shanliang wrote: > Jaroslav Bachorik wrote: >> On 01/09/2013 03:25 PM, shanliang wrote: >> >>> Jaroslav Bachorik wrote: >>> >>>> On 01/09/2013 02:44 PM, shanliang wrote: >>>> >>>> >>>>> Let's forget the JMX implementation at first. If an MBean is >>>>> unregistered, a user at client side calls "removeNotificationListener" >>>>> on the MBean, what should happen? if the user calls "isRegistered" on >>>>> the MBean, what should happen? >>>>> >>>>> I have done 2 tests, I used only one thread: >>>>> >>>>> 1) >>>>> ...... >>>>> localServer.unregisterMBean(myMBean); >>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean)); >>>>> >>>>> I got isRegistered = false; >>>>> >>>>> 2) >>>>> ...... >>>>> localServer.unregisterMBean(myMBean); >>>>> System.out.println("isRegistered = >>>>> "+remoteClientServer.sRegistered(myMBean)); >>>>> remoteClientServer.removeNotificationListener(myMBean, listener); >>>>> >>>>> I did not get an exception. >>>>> >>>>> The 1) told that the client could know the MBean was unregistered, >>>>> then >>>>> the client should throw an exception for the call of >>>>> "removeNotificationListener" in 2). >>>>> >>>> Yes, but then it would not test the listener leakage as it was supposed >>>> to test but rather the fact that the client throws the appropriate >>>> exception. The fact that the mbean was unregistered does not >>>> necessarily >>>> mean that the listeners were released. The main problem remains - the >>>> listeners are being cleaned-up asynchronously and the clean-up process >>>> might race against the other uses of the JMX API. >>>> >>> client.removeNotificationListener is not a right way here to test >>> listener leak, we could use some other ways, for example we keep the >>> listener in a weak reference, then after the mbean is removed, the weak >>> reference should be empty after some time. Another way is like >>> DeadListenerTest does to check whether clean has done at server side: >>> use reflection to get the "listenerMap" at server side and make sure it >>> is empty, but this need to add a private method to the class >>> ClientNotifForwarder. >>> >> >> There will still be problems with timing. You need either to wait for >> the GC to kick in to clean up the weak ref. And the listenerMap will not >> be purged of the unregistered MBean listeners until the notification is >> generated, processed on the ClientNotificationForwarder and forwarded to >> the server. So there goes the timing issue again. >> >> The problem is that the "unregisterMBean" operation does not guarantee >> that the listeners have been unregistered at the time it returns. So, >> one way or the other we will need to wait an arbitrary amount of time >> before checking for the memory leak. >> > Yes we need to wait, but you can use a cycle like: > long maxWaitingTime = 3000; > long startTime = System.currentTimeMillis(); > while ( weakReference.get != null > && System.currentTimeMillis() < startTime + > maxWaitingTime) { > System.gc(); > Thread.sleep(100); > System.gc(); > } > > if (weakReference.get != null) { > // failed > } Still you need an arbitrary timeout which might be reached under extreme conditions making this test to fail intermittently. But I'd say that's the nature of tests for memory leak fixes, due to the unpredictable nature of the GC runs. Unless you take a heap dump and do a reachability analysis you can not be sure whether a reference is dangling somwehwere or it just hasn't been collected yet :/ -JB- > > Shanliang >> -JB- >> >> >>> I think we have 3 things to do here: >>> 1) modify the test to not use removeNotificationListener for testing >>> listener leak >>> 2) create a new bug about a client does not throw an exception after an >>> mbean is unregistered >>> 3) create a bug about a client does not throw a same exception as at >>> server side. >>> >>> I will do 2) and 3), if you like you can continue 1), it might need to >>> do fix also in the JMX implementation. >>> >>> Shanliang >>> >>>> >>>> >>>>> The test "DeadListenerTest" got passed in some machines because of the >>>>> timeout for waiting a notification. I think its failure just tells >>>>> a new >>>>> bug. >>>>> >>>>> To set a longer timeout just hides the real bug, and the test might >>>>> fail >>>>> again one day if running condition is changed and you might need >>>>> longer >>>>> timeout again. >>>>> >>>> Yes, I agree with you that extending the timeout just lessens the >>>> likelihood of the race condition and does not prevent it. >>>> >>>> >>>> >>>>> Shanliang >>>>> >>>>> Jaroslav Bachorik wrote: >>>>> >>>>>> On 01/09/2013 11:08 AM, shanliang wrote: >>>>>> >>>>>> >>>>>>> Jaroslav Bachorik wrote: >>>>>>> >>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>>>>>> >>>>>>>> >>>>>>>>> I still have no idea why the test failed, but I do not see why a >>>>>>>>> longer >>>>>>>>> timeout can fix the test. Have you reproduced the problem and >>>>>>>>> tested >>>>>>>>> your fix? if yes then possible the long timeout hided a real >>>>>>>>> problem. >>>>>>>>> >>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and >>>>>>>> -Xcomp >>>>>>>> switch) and verify that the fix makes the test pass. >>>>>>>> >>>>>>>> The ClientNotifForwarder scans the notifications for >>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>>>>>> appropriate notification listeners in a separate thread. Thus, >>>>>>>> calling >>>>>>>> "removeNotificationListener" on the main thread is prone to racing. >>>>>>>> >>>>>>> It is true that ClientNotifForwarder scans the notifications for >>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>>>>> appropriate notification listeners in a separate thread. This is >>>>>>> for a >>>>>>> client connection to do clean if a user never calls >>>>>>> removeNotificationListener. >>>>>>> >>>>>>> But calling directly removeNotificationListener from a client should >>>>>>> still get exception if the clean has not been done. As I said, if >>>>>>> the >>>>>>> client checked and found the listener was still there, then the >>>>>>> client >>>>>>> sent a request to its server to remove the listener at server side, >>>>>>> the >>>>>>> server should find that the MBean in question was not registered, >>>>>>> so the >>>>>>> server should throw an exception. The bug might be here. >>>>>>> >>>>>> This won't work. The server side listeners are removed upon receiving >>>>>> the "unregistered" notification which is delivered from the >>>>>> ClientNotificationForwarder and it may have not run yet (since it >>>>>> runs >>>>>> in a separate executor thread). The result is that the attempt to >>>>>> remove >>>>>> the notification listener on the server will succeed as well failing >>>>>> the >>>>>> test subsequently. >>>>>> >>>>>> -JB- >>>>>> >>>>>> >>>>>> >>>>>>> Shanliang >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> The timeout you made longer was used to wait a notification which >>>>>>>>> should >>>>>>>>> never arrive. >>>>>>>>> >>>>>>>> Well, it can be used to allow more time to process the "unregister" >>>>>>>> notification, too. >>>>>>>> >>>>>>>> When I think more of this I am more inclined to fix the race >>>>>>>> condition. >>>>>>>> An updated webrev will follow. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> To remove a listener from a client side, we did: >>>>>>>>> 1) at client side, check whether it was added in the client side >>>>>>>>> 2) at server side, check whether the MBean in question was >>>>>>>>> registered in >>>>>>>>> the MBeanServer (!!!) >>>>>>>>> 3) at server side, check whether the listener was added. >>>>>>>>> >>>>>>>>> So 2) tells that we did not rely on a "unregister" notification. >>>>>>>>> Anyway, >>>>>>>>> if you use a SAME thread to call "unregister" operation to >>>>>>>>> unregister an >>>>>>>>> mbean, then any following call (without any time break) to use the >>>>>>>>> mbean >>>>>>>>> should fail, like "removeNotificationListener", "isRegistered" >>>>>>>>> etc. >>>>>>>>> >>>>>>>>> I do see a bug here, if we remove a listener from a non-existing >>>>>>>>> MBeam, >>>>>>>>> we get "ListenerNotFoundException" at a client side, but get >>>>>>>>> "InstanceNotFoundException" at server side, I think we should >>>>>>>>> create a >>>>>>>>> bug, because both implemented the same interface >>>>>>>>> MBeanServerConnection. >>>>>>>>> >>>>>>>> Yes, it is rather inconsistent. >>>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Shanliang >>>>>>>>> >>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>> >>>>>>>>>> Looking for review and a sponsor. >>>>>>>>>> >>>>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>>>>>> >>>>>>>>>> In this issue the timing is the problem. >>>>>>>>>> MBeanServer.unregisterMBean() >>>>>>>>>> fires the "unregister" notification which is sent to the server >>>>>>>>>> asynchronously. Thus it may happen that the "unregister" >>>>>>>>>> notification >>>>>>>>>> has not been yet processed at the time of invoking >>>>>>>>>> removeNotificationListener() and the notification listeners >>>>>>>>>> hasn't >>>>>>>>>> been >>>>>>>>>> cleaned up leading to the test failure. >>>>>>>>>> >>>>>>>>>> There is no synchronization between the client and the server and >>>>>>>>>> such >>>>>>>>>> race condition can occur occasionally. Normally, the execution is >>>>>>>>>> fast >>>>>>>>>> enough to behave like the "unregister" notification is processed >>>>>>>>>> synchronously with the unregisterMBean() operation but it seems >>>>>>>>>> that >>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains >>>>>>>>>> the CPU >>>>>>>>>> enough to make this problem appear. >>>>>>>>>> >>>>>>>>>> There is no proper fix for this - the only thing that work is >>>>>>>>>> waiting a >>>>>>>>>> bit longer in the main thread to give the notification processing >>>>>>>>>> thread >>>>>>>>>> some time to clean up the listeners. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> -JB- >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> > > From shanliang.jiang at oracle.com Thu Jan 10 03:53:10 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Thu, 10 Jan 2013 12:53:10 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50EEA8F8.7090007@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> <50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com> <50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com> <50EEA8F8.7090007@oracle.com> Message-ID: <50EEABA6.6010203@oracle.com> Instead to wait GC, you can also to wait the MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive it, then your listener must be removed too. Of course this solution is implementation dependent, but the test is already implementation dependent. Shanliang Jaroslav Bachorik wrote: > On 01/10/2013 10:05 AM, shanliang wrote: > >> Jaroslav Bachorik wrote: >> >>> On 01/09/2013 03:25 PM, shanliang wrote: >>> >>> >>>> Jaroslav Bachorik wrote: >>>> >>>> >>>>> On 01/09/2013 02:44 PM, shanliang wrote: >>>>> >>>>> >>>>> >>>>>> Let's forget the JMX implementation at first. If an MBean is >>>>>> unregistered, a user at client side calls "removeNotificationListener" >>>>>> on the MBean, what should happen? if the user calls "isRegistered" on >>>>>> the MBean, what should happen? >>>>>> >>>>>> I have done 2 tests, I used only one thread: >>>>>> >>>>>> 1) >>>>>> ...... >>>>>> localServer.unregisterMBean(myMBean); >>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean)); >>>>>> >>>>>> I got isRegistered = false; >>>>>> >>>>>> 2) >>>>>> ...... >>>>>> localServer.unregisterMBean(myMBean); >>>>>> System.out.println("isRegistered = >>>>>> "+remoteClientServer.sRegistered(myMBean)); >>>>>> remoteClientServer.removeNotificationListener(myMBean, listener); >>>>>> >>>>>> I did not get an exception. >>>>>> >>>>>> The 1) told that the client could know the MBean was unregistered, >>>>>> then >>>>>> the client should throw an exception for the call of >>>>>> "removeNotificationListener" in 2). >>>>>> >>>>>> >>>>> Yes, but then it would not test the listener leakage as it was supposed >>>>> to test but rather the fact that the client throws the appropriate >>>>> exception. The fact that the mbean was unregistered does not >>>>> necessarily >>>>> mean that the listeners were released. The main problem remains - the >>>>> listeners are being cleaned-up asynchronously and the clean-up process >>>>> might race against the other uses of the JMX API. >>>>> >>>>> >>>> client.removeNotificationListener is not a right way here to test >>>> listener leak, we could use some other ways, for example we keep the >>>> listener in a weak reference, then after the mbean is removed, the weak >>>> reference should be empty after some time. Another way is like >>>> DeadListenerTest does to check whether clean has done at server side: >>>> use reflection to get the "listenerMap" at server side and make sure it >>>> is empty, but this need to add a private method to the class >>>> ClientNotifForwarder. >>>> >>>> >>> There will still be problems with timing. You need either to wait for >>> the GC to kick in to clean up the weak ref. And the listenerMap will not >>> be purged of the unregistered MBean listeners until the notification is >>> generated, processed on the ClientNotificationForwarder and forwarded to >>> the server. So there goes the timing issue again. >>> >>> The problem is that the "unregisterMBean" operation does not guarantee >>> that the listeners have been unregistered at the time it returns. So, >>> one way or the other we will need to wait an arbitrary amount of time >>> before checking for the memory leak. >>> >>> >> Yes we need to wait, but you can use a cycle like: >> long maxWaitingTime = 3000; >> long startTime = System.currentTimeMillis(); >> while ( weakReference.get != null >> && System.currentTimeMillis() < startTime + >> maxWaitingTime) { >> System.gc(); >> Thread.sleep(100); >> System.gc(); >> } >> >> if (weakReference.get != null) { >> // failed >> } >> > > Still you need an arbitrary timeout which might be reached under extreme > conditions making this test to fail intermittently. But I'd say that's > the nature of tests for memory leak fixes, due to the unpredictable > nature of the GC runs. Unless you take a heap dump and do a reachability > analysis you can not be sure whether a reference is dangling somwehwere > or it just hasn't been collected yet :/ > > -JB- > > >> Shanliang >> >>> -JB- >>> >>> >>> >>>> I think we have 3 things to do here: >>>> 1) modify the test to not use removeNotificationListener for testing >>>> listener leak >>>> 2) create a new bug about a client does not throw an exception after an >>>> mbean is unregistered >>>> 3) create a bug about a client does not throw a same exception as at >>>> server side. >>>> >>>> I will do 2) and 3), if you like you can continue 1), it might need to >>>> do fix also in the JMX implementation. >>>> >>>> Shanliang >>>> >>>> >>>>> >>>>> >>>>> >>>>>> The test "DeadListenerTest" got passed in some machines because of the >>>>>> timeout for waiting a notification. I think its failure just tells >>>>>> a new >>>>>> bug. >>>>>> >>>>>> To set a longer timeout just hides the real bug, and the test might >>>>>> fail >>>>>> again one day if running condition is changed and you might need >>>>>> longer >>>>>> timeout again. >>>>>> >>>>>> >>>>> Yes, I agree with you that extending the timeout just lessens the >>>>> likelihood of the race condition and does not prevent it. >>>>> >>>>> >>>>> >>>>> >>>>>> Shanliang >>>>>> >>>>>> Jaroslav Bachorik wrote: >>>>>> >>>>>> >>>>>>> On 01/09/2013 11:08 AM, shanliang wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Jaroslav Bachorik wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> I still have no idea why the test failed, but I do not see why a >>>>>>>>>> longer >>>>>>>>>> timeout can fix the test. Have you reproduced the problem and >>>>>>>>>> tested >>>>>>>>>> your fix? if yes then possible the long timeout hided a real >>>>>>>>>> problem. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and >>>>>>>>> -Xcomp >>>>>>>>> switch) and verify that the fix makes the test pass. >>>>>>>>> >>>>>>>>> The ClientNotifForwarder scans the notifications for >>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>>>>>>> appropriate notification listeners in a separate thread. Thus, >>>>>>>>> calling >>>>>>>>> "removeNotificationListener" on the main thread is prone to racing. >>>>>>>>> >>>>>>>>> >>>>>>>> It is true that ClientNotifForwarder scans the notifications for >>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the >>>>>>>> appropriate notification listeners in a separate thread. This is >>>>>>>> for a >>>>>>>> client connection to do clean if a user never calls >>>>>>>> removeNotificationListener. >>>>>>>> >>>>>>>> But calling directly removeNotificationListener from a client should >>>>>>>> still get exception if the clean has not been done. As I said, if >>>>>>>> the >>>>>>>> client checked and found the listener was still there, then the >>>>>>>> client >>>>>>>> sent a request to its server to remove the listener at server side, >>>>>>>> the >>>>>>>> server should find that the MBean in question was not registered, >>>>>>>> so the >>>>>>>> server should throw an exception. The bug might be here. >>>>>>>> >>>>>>>> >>>>>>> This won't work. The server side listeners are removed upon receiving >>>>>>> the "unregistered" notification which is delivered from the >>>>>>> ClientNotificationForwarder and it may have not run yet (since it >>>>>>> runs >>>>>>> in a separate executor thread). The result is that the attempt to >>>>>>> remove >>>>>>> the notification listener on the server will succeed as well failing >>>>>>> the >>>>>>> test subsequently. >>>>>>> >>>>>>> -JB- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Shanliang >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> The timeout you made longer was used to wait a notification which >>>>>>>>>> should >>>>>>>>>> never arrive. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Well, it can be used to allow more time to process the "unregister" >>>>>>>>> notification, too. >>>>>>>>> >>>>>>>>> When I think more of this I am more inclined to fix the race >>>>>>>>> condition. >>>>>>>>> An updated webrev will follow. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> To remove a listener from a client side, we did: >>>>>>>>>> 1) at client side, check whether it was added in the client side >>>>>>>>>> 2) at server side, check whether the MBean in question was >>>>>>>>>> registered in >>>>>>>>>> the MBeanServer (!!!) >>>>>>>>>> 3) at server side, check whether the listener was added. >>>>>>>>>> >>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification. >>>>>>>>>> Anyway, >>>>>>>>>> if you use a SAME thread to call "unregister" operation to >>>>>>>>>> unregister an >>>>>>>>>> mbean, then any following call (without any time break) to use the >>>>>>>>>> mbean >>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered" >>>>>>>>>> etc. >>>>>>>>>> >>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing >>>>>>>>>> MBeam, >>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get >>>>>>>>>> "InstanceNotFoundException" at server side, I think we should >>>>>>>>>> create a >>>>>>>>>> bug, because both implemented the same interface >>>>>>>>>> MBeanServerConnection. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Yes, it is rather inconsistent. >>>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Shanliang >>>>>>>>>> >>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Looking for review and a sponsor. >>>>>>>>>>> >>>>>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>>>>>>> >>>>>>>>>>> In this issue the timing is the problem. >>>>>>>>>>> MBeanServer.unregisterMBean() >>>>>>>>>>> fires the "unregister" notification which is sent to the server >>>>>>>>>>> asynchronously. Thus it may happen that the "unregister" >>>>>>>>>>> notification >>>>>>>>>>> has not been yet processed at the time of invoking >>>>>>>>>>> removeNotificationListener() and the notification listeners >>>>>>>>>>> hasn't >>>>>>>>>>> been >>>>>>>>>>> cleaned up leading to the test failure. >>>>>>>>>>> >>>>>>>>>>> There is no synchronization between the client and the server and >>>>>>>>>>> such >>>>>>>>>>> race condition can occur occasionally. Normally, the execution is >>>>>>>>>>> fast >>>>>>>>>>> enough to behave like the "unregister" notification is processed >>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems >>>>>>>>>>> that >>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains >>>>>>>>>>> the CPU >>>>>>>>>>> enough to make this problem appear. >>>>>>>>>>> >>>>>>>>>>> There is no proper fix for this - the only thing that work is >>>>>>>>>>> waiting a >>>>>>>>>>> bit longer in the main thread to give the notification processing >>>>>>>>>>> thread >>>>>>>>>>> some time to clean up the listeners. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130110/c2e94c85/attachment-0001.html From jaroslav.bachorik at oracle.com Thu Jan 10 04:09:04 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 10 Jan 2013 13:09:04 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50EEABA6.6010203@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> <50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com> <50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com> <50EEA8F8.7090007@oracle.com> <50EEABA6.6010203@oracle.com> Message-ID: <50EEAF60.7040801@oracle.com> On 01/10/2013 12:53 PM, shanliang wrote: > Instead to wait GC, you can also to wait the > MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive > it, then your listener must be removed too. Of course this solution is The problem is that the *NotificationForwarder implementations swallow this kind of notification and just perform the cleanup. No other listener will ever receive this notification. The "unregisterMBean" operation's semantics is not clearly defined. Intuitively, when unregistering an MBean all the associated listeners should be gone before the method returns. But this is not the case - currently the listeners are sanitized some time after the "unregisterMBean" operation started, eventually. There is no easy way to notify the API user that the listeners were removed. I am afraid that in order to resolve these problems new APIs would need to be introduced and the whole mechanism of delivering notification should be revisited (as it was planned for JMX 2.0, anyway). As for fixing the test - checking the weak references works fine as well as increasing the timeout. They both can fail when the system is extremely busy but the GC based solution will be in general faster than the one with increased timeout. -JB- > implementation dependent, but the test is already implementation dependent. > > Shanliang > > > Jaroslav Bachorik wrote: >> On 01/10/2013 10:05 AM, shanliang wrote: >> >>> Jaroslav Bachorik wrote: >>> >>>> On 01/09/2013 03:25 PM, shanliang wrote: >>>> >>>> >>>>> Jaroslav Bachorik wrote: >>>>> >>>>>> On 01/09/2013 02:44 PM, shanliang wrote: >>>>>> >>>>>> >>>>>>> Let's forget the JMX implementation at first. If an MBean is >>>>>>> unregistered, a user at client side calls >>>>>>> "removeNotificationListener" >>>>>>> on the MBean, what should happen? if the user calls >>>>>>> "isRegistered" on >>>>>>> the MBean, what should happen? >>>>>>> >>>>>>> I have done 2 tests, I used only one thread: >>>>>>> >>>>>>> 1) >>>>>>> ...... >>>>>>> localServer.unregisterMBean(myMBean); >>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean)); >>>>>>> >>>>>>> I got isRegistered = false; >>>>>>> >>>>>>> 2) >>>>>>> ...... >>>>>>> localServer.unregisterMBean(myMBean); >>>>>>> System.out.println("isRegistered = >>>>>>> "+remoteClientServer.sRegistered(myMBean)); >>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener); >>>>>>> >>>>>>> I did not get an exception. >>>>>>> >>>>>>> The 1) told that the client could know the MBean was unregistered, >>>>>>> then >>>>>>> the client should throw an exception for the call of >>>>>>> "removeNotificationListener" in 2). >>>>>>> >>>>>> Yes, but then it would not test the listener leakage as it was >>>>>> supposed >>>>>> to test but rather the fact that the client throws the appropriate >>>>>> exception. The fact that the mbean was unregistered does not >>>>>> necessarily >>>>>> mean that the listeners were released. The main problem remains - the >>>>>> listeners are being cleaned-up asynchronously and the clean-up >>>>>> process >>>>>> might race against the other uses of the JMX API. >>>>>> >>>>> client.removeNotificationListener is not a right way here to test >>>>> listener leak, we could use some other ways, for example we keep the >>>>> listener in a weak reference, then after the mbean is removed, the >>>>> weak >>>>> reference should be empty after some time. Another way is like >>>>> DeadListenerTest does to check whether clean has done at server side: >>>>> use reflection to get the "listenerMap" at server side and make >>>>> sure it >>>>> is empty, but this need to add a private method to the class >>>>> ClientNotifForwarder. >>>>> >>>> There will still be problems with timing. You need either to wait for >>>> the GC to kick in to clean up the weak ref. And the listenerMap will >>>> not >>>> be purged of the unregistered MBean listeners until the notification is >>>> generated, processed on the ClientNotificationForwarder and >>>> forwarded to >>>> the server. So there goes the timing issue again. >>>> >>>> The problem is that the "unregisterMBean" operation does not guarantee >>>> that the listeners have been unregistered at the time it returns. So, >>>> one way or the other we will need to wait an arbitrary amount of time >>>> before checking for the memory leak. >>>> >>> Yes we need to wait, but you can use a cycle like: >>> long maxWaitingTime = 3000; >>> long startTime = System.currentTimeMillis(); >>> while ( weakReference.get != null >>> && System.currentTimeMillis() < startTime + >>> maxWaitingTime) { >>> System.gc(); >>> Thread.sleep(100); >>> System.gc(); >>> } >>> >>> if (weakReference.get != null) { >>> // failed >>> } >>> >> >> Still you need an arbitrary timeout which might be reached under extreme >> conditions making this test to fail intermittently. But I'd say that's >> the nature of tests for memory leak fixes, due to the unpredictable >> nature of the GC runs. Unless you take a heap dump and do a reachability >> analysis you can not be sure whether a reference is dangling somwehwere >> or it just hasn't been collected yet :/ >> >> -JB- >> >> >>> Shanliang >>> >>>> -JB- >>>> >>>> >>>> >>>>> I think we have 3 things to do here: >>>>> 1) modify the test to not use removeNotificationListener for testing >>>>> listener leak >>>>> 2) create a new bug about a client does not throw an exception >>>>> after an >>>>> mbean is unregistered >>>>> 3) create a bug about a client does not throw a same exception as at >>>>> server side. >>>>> >>>>> I will do 2) and 3), if you like you can continue 1), it might need to >>>>> do fix also in the JMX implementation. >>>>> >>>>> Shanliang >>>>> >>>>>> >>>>>> >>>>>>> The test "DeadListenerTest" got passed in some machines because >>>>>>> of the >>>>>>> timeout for waiting a notification. I think its failure just tells >>>>>>> a new >>>>>>> bug. >>>>>>> >>>>>>> To set a longer timeout just hides the real bug, and the test might >>>>>>> fail >>>>>>> again one day if running condition is changed and you might need >>>>>>> longer >>>>>>> timeout again. >>>>>>> >>>>>> Yes, I agree with you that extending the timeout just lessens the >>>>>> likelihood of the race condition and does not prevent it. >>>>>> >>>>>> >>>>>> >>>>>>> Shanliang >>>>>>> >>>>>>> Jaroslav Bachorik wrote: >>>>>>> >>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>> >>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I still have no idea why the test failed, but I do not see why a >>>>>>>>>>> longer >>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and >>>>>>>>>>> tested >>>>>>>>>>> your fix? if yes then possible the long timeout hided a real >>>>>>>>>>> problem. >>>>>>>>>>> >>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and >>>>>>>>>> -Xcomp >>>>>>>>>> switch) and verify that the fix makes the test pass. >>>>>>>>>> >>>>>>>>>> The ClientNotifForwarder scans the notifications for >>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and >>>>>>>>>> removes the >>>>>>>>>> appropriate notification listeners in a separate thread. Thus, >>>>>>>>>> calling >>>>>>>>>> "removeNotificationListener" on the main thread is prone to >>>>>>>>>> racing. >>>>>>>>>> >>>>>>>>> It is true that ClientNotifForwarder scans the notifications for >>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes >>>>>>>>> the >>>>>>>>> appropriate notification listeners in a separate thread. This is >>>>>>>>> for a >>>>>>>>> client connection to do clean if a user never calls >>>>>>>>> removeNotificationListener. >>>>>>>>> >>>>>>>>> But calling directly removeNotificationListener from a client >>>>>>>>> should >>>>>>>>> still get exception if the clean has not been done. As I said, if >>>>>>>>> the >>>>>>>>> client checked and found the listener was still there, then the >>>>>>>>> client >>>>>>>>> sent a request to its server to remove the listener at server >>>>>>>>> side, >>>>>>>>> the >>>>>>>>> server should find that the MBean in question was not registered, >>>>>>>>> so the >>>>>>>>> server should throw an exception. The bug might be here. >>>>>>>>> >>>>>>>> This won't work. The server side listeners are removed upon >>>>>>>> receiving >>>>>>>> the "unregistered" notification which is delivered from the >>>>>>>> ClientNotificationForwarder and it may have not run yet (since it >>>>>>>> runs >>>>>>>> in a separate executor thread). The result is that the attempt to >>>>>>>> remove >>>>>>>> the notification listener on the server will succeed as well >>>>>>>> failing >>>>>>>> the >>>>>>>> test subsequently. >>>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Shanliang >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> The timeout you made longer was used to wait a notification >>>>>>>>>>> which >>>>>>>>>>> should >>>>>>>>>>> never arrive. >>>>>>>>>>> >>>>>>>>>> Well, it can be used to allow more time to process the >>>>>>>>>> "unregister" >>>>>>>>>> notification, too. >>>>>>>>>> >>>>>>>>>> When I think more of this I am more inclined to fix the race >>>>>>>>>> condition. >>>>>>>>>> An updated webrev will follow. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> To remove a listener from a client side, we did: >>>>>>>>>>> 1) at client side, check whether it was added in the client side >>>>>>>>>>> 2) at server side, check whether the MBean in question was >>>>>>>>>>> registered in >>>>>>>>>>> the MBeanServer (!!!) >>>>>>>>>>> 3) at server side, check whether the listener was added. >>>>>>>>>>> >>>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification. >>>>>>>>>>> Anyway, >>>>>>>>>>> if you use a SAME thread to call "unregister" operation to >>>>>>>>>>> unregister an >>>>>>>>>>> mbean, then any following call (without any time break) to >>>>>>>>>>> use the >>>>>>>>>>> mbean >>>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered" >>>>>>>>>>> etc. >>>>>>>>>>> >>>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing >>>>>>>>>>> MBeam, >>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get >>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should >>>>>>>>>>> create a >>>>>>>>>>> bug, because both implemented the same interface >>>>>>>>>>> MBeanServerConnection. >>>>>>>>>>> >>>>>>>>>> Yes, it is rather inconsistent. >>>>>>>>>> >>>>>>>>>> -JB- >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Shanliang >>>>>>>>>>> >>>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>>> >>>>>>>>>>>> Looking for review and a sponsor. >>>>>>>>>>>> >>>>>>>>>>>> Webrev at >>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>>>>>>>> >>>>>>>>>>>> In this issue the timing is the problem. >>>>>>>>>>>> MBeanServer.unregisterMBean() >>>>>>>>>>>> fires the "unregister" notification which is sent to the server >>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister" >>>>>>>>>>>> notification >>>>>>>>>>>> has not been yet processed at the time of invoking >>>>>>>>>>>> removeNotificationListener() and the notification listeners >>>>>>>>>>>> hasn't >>>>>>>>>>>> been >>>>>>>>>>>> cleaned up leading to the test failure. >>>>>>>>>>>> >>>>>>>>>>>> There is no synchronization between the client and the >>>>>>>>>>>> server and >>>>>>>>>>>> such >>>>>>>>>>>> race condition can occur occasionally. Normally, the >>>>>>>>>>>> execution is >>>>>>>>>>>> fast >>>>>>>>>>>> enough to behave like the "unregister" notification is >>>>>>>>>>>> processed >>>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems >>>>>>>>>>>> that >>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains >>>>>>>>>>>> the CPU >>>>>>>>>>>> enough to make this problem appear. >>>>>>>>>>>> >>>>>>>>>>>> There is no proper fix for this - the only thing that work is >>>>>>>>>>>> waiting a >>>>>>>>>>>> bit longer in the main thread to give the notification >>>>>>>>>>>> processing >>>>>>>>>>>> thread >>>>>>>>>>>> some time to clean up the listeners. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> >>>>>>>>>>>> -JB- >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> > > From jaroslav.bachorik at oracle.com Thu Jan 10 04:31:45 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 10 Jan 2013 13:31:45 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50EEAF60.7040801@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> <50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com> <50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com> <50EEA8F8.7090007@oracle.com> <50EEABA6.6010203@oracle.com> <50EEAF60.7040801@oracle.com> Message-ID: <50EEB4B1.8070101@oracle.com> On 01/10/2013 01:09 PM, Jaroslav Bachorik wrote: > On 01/10/2013 12:53 PM, shanliang wrote: >> Instead to wait GC, you can also to wait the >> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive >> it, then your listener must be removed too. Of course this solution is > > The problem is that the *NotificationForwarder implementations swallow > this kind of notification and just perform the cleanup. No other > listener will ever receive this notification. > > The "unregisterMBean" operation's semantics is not clearly defined. > Intuitively, when unregistering an MBean all the associated listeners > should be gone before the method returns. But this is not the case - > currently the listeners are sanitized some time after the > "unregisterMBean" operation started, eventually. There is no easy way to > notify the API user that the listeners were removed. I am afraid that in > order to resolve these problems new APIs would need to be introduced and > the whole mechanism of delivering notification should be revisited (as > it was planned for JMX 2.0, anyway). > > As for fixing the test - checking the weak references works fine as well > as increasing the timeout. They both can fail when the system is > extremely busy but the GC based solution will be in general faster than > the one with increased timeout. Updated webrev: http://cr.openjdk.java.net/~jbachorik/7170447/webrev.01 > > -JB- > >> implementation dependent, but the test is already implementation dependent. >> >> Shanliang >> >> >> Jaroslav Bachorik wrote: >>> On 01/10/2013 10:05 AM, shanliang wrote: >>> >>>> Jaroslav Bachorik wrote: >>>> >>>>> On 01/09/2013 03:25 PM, shanliang wrote: >>>>> >>>>> >>>>>> Jaroslav Bachorik wrote: >>>>>> >>>>>>> On 01/09/2013 02:44 PM, shanliang wrote: >>>>>>> >>>>>>> >>>>>>>> Let's forget the JMX implementation at first. If an MBean is >>>>>>>> unregistered, a user at client side calls >>>>>>>> "removeNotificationListener" >>>>>>>> on the MBean, what should happen? if the user calls >>>>>>>> "isRegistered" on >>>>>>>> the MBean, what should happen? >>>>>>>> >>>>>>>> I have done 2 tests, I used only one thread: >>>>>>>> >>>>>>>> 1) >>>>>>>> ...... >>>>>>>> localServer.unregisterMBean(myMBean); >>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean)); >>>>>>>> >>>>>>>> I got isRegistered = false; >>>>>>>> >>>>>>>> 2) >>>>>>>> ...... >>>>>>>> localServer.unregisterMBean(myMBean); >>>>>>>> System.out.println("isRegistered = >>>>>>>> "+remoteClientServer.sRegistered(myMBean)); >>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener); >>>>>>>> >>>>>>>> I did not get an exception. >>>>>>>> >>>>>>>> The 1) told that the client could know the MBean was unregistered, >>>>>>>> then >>>>>>>> the client should throw an exception for the call of >>>>>>>> "removeNotificationListener" in 2). >>>>>>>> >>>>>>> Yes, but then it would not test the listener leakage as it was >>>>>>> supposed >>>>>>> to test but rather the fact that the client throws the appropriate >>>>>>> exception. The fact that the mbean was unregistered does not >>>>>>> necessarily >>>>>>> mean that the listeners were released. The main problem remains - the >>>>>>> listeners are being cleaned-up asynchronously and the clean-up >>>>>>> process >>>>>>> might race against the other uses of the JMX API. >>>>>>> >>>>>> client.removeNotificationListener is not a right way here to test >>>>>> listener leak, we could use some other ways, for example we keep the >>>>>> listener in a weak reference, then after the mbean is removed, the >>>>>> weak >>>>>> reference should be empty after some time. Another way is like >>>>>> DeadListenerTest does to check whether clean has done at server side: >>>>>> use reflection to get the "listenerMap" at server side and make >>>>>> sure it >>>>>> is empty, but this need to add a private method to the class >>>>>> ClientNotifForwarder. >>>>>> >>>>> There will still be problems with timing. You need either to wait for >>>>> the GC to kick in to clean up the weak ref. And the listenerMap will >>>>> not >>>>> be purged of the unregistered MBean listeners until the notification is >>>>> generated, processed on the ClientNotificationForwarder and >>>>> forwarded to >>>>> the server. So there goes the timing issue again. >>>>> >>>>> The problem is that the "unregisterMBean" operation does not guarantee >>>>> that the listeners have been unregistered at the time it returns. So, >>>>> one way or the other we will need to wait an arbitrary amount of time >>>>> before checking for the memory leak. >>>>> >>>> Yes we need to wait, but you can use a cycle like: >>>> long maxWaitingTime = 3000; >>>> long startTime = System.currentTimeMillis(); >>>> while ( weakReference.get != null >>>> && System.currentTimeMillis() < startTime + >>>> maxWaitingTime) { >>>> System.gc(); >>>> Thread.sleep(100); >>>> System.gc(); >>>> } >>>> >>>> if (weakReference.get != null) { >>>> // failed >>>> } >>>> >>> >>> Still you need an arbitrary timeout which might be reached under extreme >>> conditions making this test to fail intermittently. But I'd say that's >>> the nature of tests for memory leak fixes, due to the unpredictable >>> nature of the GC runs. Unless you take a heap dump and do a reachability >>> analysis you can not be sure whether a reference is dangling somwehwere >>> or it just hasn't been collected yet :/ >>> >>> -JB- >>> >>> >>>> Shanliang >>>> >>>>> -JB- >>>>> >>>>> >>>>> >>>>>> I think we have 3 things to do here: >>>>>> 1) modify the test to not use removeNotificationListener for testing >>>>>> listener leak >>>>>> 2) create a new bug about a client does not throw an exception >>>>>> after an >>>>>> mbean is unregistered >>>>>> 3) create a bug about a client does not throw a same exception as at >>>>>> server side. >>>>>> >>>>>> I will do 2) and 3), if you like you can continue 1), it might need to >>>>>> do fix also in the JMX implementation. >>>>>> >>>>>> Shanliang >>>>>> >>>>>>> >>>>>>> >>>>>>>> The test "DeadListenerTest" got passed in some machines because >>>>>>>> of the >>>>>>>> timeout for waiting a notification. I think its failure just tells >>>>>>>> a new >>>>>>>> bug. >>>>>>>> >>>>>>>> To set a longer timeout just hides the real bug, and the test might >>>>>>>> fail >>>>>>>> again one day if running condition is changed and you might need >>>>>>>> longer >>>>>>>> timeout again. >>>>>>>> >>>>>>> Yes, I agree with you that extending the timeout just lessens the >>>>>>> likelihood of the race condition and does not prevent it. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Shanliang >>>>>>>> >>>>>>>> Jaroslav Bachorik wrote: >>>>>>>> >>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>> >>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I still have no idea why the test failed, but I do not see why a >>>>>>>>>>>> longer >>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and >>>>>>>>>>>> tested >>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real >>>>>>>>>>>> problem. >>>>>>>>>>>> >>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and >>>>>>>>>>> -Xcomp >>>>>>>>>>> switch) and verify that the fix makes the test pass. >>>>>>>>>>> >>>>>>>>>>> The ClientNotifForwarder scans the notifications for >>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and >>>>>>>>>>> removes the >>>>>>>>>>> appropriate notification listeners in a separate thread. Thus, >>>>>>>>>>> calling >>>>>>>>>>> "removeNotificationListener" on the main thread is prone to >>>>>>>>>>> racing. >>>>>>>>>>> >>>>>>>>>> It is true that ClientNotifForwarder scans the notifications for >>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes >>>>>>>>>> the >>>>>>>>>> appropriate notification listeners in a separate thread. This is >>>>>>>>>> for a >>>>>>>>>> client connection to do clean if a user never calls >>>>>>>>>> removeNotificationListener. >>>>>>>>>> >>>>>>>>>> But calling directly removeNotificationListener from a client >>>>>>>>>> should >>>>>>>>>> still get exception if the clean has not been done. As I said, if >>>>>>>>>> the >>>>>>>>>> client checked and found the listener was still there, then the >>>>>>>>>> client >>>>>>>>>> sent a request to its server to remove the listener at server >>>>>>>>>> side, >>>>>>>>>> the >>>>>>>>>> server should find that the MBean in question was not registered, >>>>>>>>>> so the >>>>>>>>>> server should throw an exception. The bug might be here. >>>>>>>>>> >>>>>>>>> This won't work. The server side listeners are removed upon >>>>>>>>> receiving >>>>>>>>> the "unregistered" notification which is delivered from the >>>>>>>>> ClientNotificationForwarder and it may have not run yet (since it >>>>>>>>> runs >>>>>>>>> in a separate executor thread). The result is that the attempt to >>>>>>>>> remove >>>>>>>>> the notification listener on the server will succeed as well >>>>>>>>> failing >>>>>>>>> the >>>>>>>>> test subsequently. >>>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Shanliang >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> The timeout you made longer was used to wait a notification >>>>>>>>>>>> which >>>>>>>>>>>> should >>>>>>>>>>>> never arrive. >>>>>>>>>>>> >>>>>>>>>>> Well, it can be used to allow more time to process the >>>>>>>>>>> "unregister" >>>>>>>>>>> notification, too. >>>>>>>>>>> >>>>>>>>>>> When I think more of this I am more inclined to fix the race >>>>>>>>>>> condition. >>>>>>>>>>> An updated webrev will follow. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> To remove a listener from a client side, we did: >>>>>>>>>>>> 1) at client side, check whether it was added in the client side >>>>>>>>>>>> 2) at server side, check whether the MBean in question was >>>>>>>>>>>> registered in >>>>>>>>>>>> the MBeanServer (!!!) >>>>>>>>>>>> 3) at server side, check whether the listener was added. >>>>>>>>>>>> >>>>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification. >>>>>>>>>>>> Anyway, >>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to >>>>>>>>>>>> unregister an >>>>>>>>>>>> mbean, then any following call (without any time break) to >>>>>>>>>>>> use the >>>>>>>>>>>> mbean >>>>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered" >>>>>>>>>>>> etc. >>>>>>>>>>>> >>>>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing >>>>>>>>>>>> MBeam, >>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get >>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should >>>>>>>>>>>> create a >>>>>>>>>>>> bug, because both implemented the same interface >>>>>>>>>>>> MBeanServerConnection. >>>>>>>>>>>> >>>>>>>>>>> Yes, it is rather inconsistent. >>>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Shanliang >>>>>>>>>>>> >>>>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Looking for review and a sponsor. >>>>>>>>>>>>> >>>>>>>>>>>>> Webrev at >>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>>>>>>>>> >>>>>>>>>>>>> In this issue the timing is the problem. >>>>>>>>>>>>> MBeanServer.unregisterMBean() >>>>>>>>>>>>> fires the "unregister" notification which is sent to the server >>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister" >>>>>>>>>>>>> notification >>>>>>>>>>>>> has not been yet processed at the time of invoking >>>>>>>>>>>>> removeNotificationListener() and the notification listeners >>>>>>>>>>>>> hasn't >>>>>>>>>>>>> been >>>>>>>>>>>>> cleaned up leading to the test failure. >>>>>>>>>>>>> >>>>>>>>>>>>> There is no synchronization between the client and the >>>>>>>>>>>>> server and >>>>>>>>>>>>> such >>>>>>>>>>>>> race condition can occur occasionally. Normally, the >>>>>>>>>>>>> execution is >>>>>>>>>>>>> fast >>>>>>>>>>>>> enough to behave like the "unregister" notification is >>>>>>>>>>>>> processed >>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems >>>>>>>>>>>>> that >>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains >>>>>>>>>>>>> the CPU >>>>>>>>>>>>> enough to make this problem appear. >>>>>>>>>>>>> >>>>>>>>>>>>> There is no proper fix for this - the only thing that work is >>>>>>>>>>>>> waiting a >>>>>>>>>>>>> bit longer in the main thread to give the notification >>>>>>>>>>>>> processing >>>>>>>>>>>>> thread >>>>>>>>>>>>> some time to clean up the listeners. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> >>>>>>>>>>>>> -JB- >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >> >> > From shanliang.jiang at oracle.com Thu Jan 10 05:18:32 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Thu, 10 Jan 2013 14:18:32 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50EEB4B1.8070101@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> <50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com> <50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com> <50EEA8F8.7090007@oracle.com> <50EEABA6.6010203@oracle.com> <50EEAF60.7040801@oracle.com> <50EEB4B1.8070101@oracle.com> Message-ID: <50EEBFA8.5010001@oracle.com> The weakListener is unnecessary, the test does already the same verification: 171 Set setForUnreg = listenerMap.get(name); 172 assertTrue("No trace of unregistered MBean: " + setForUnreg, setForUnreg == null); All other are OK for me. Shanliang Jaroslav Bachorik wrote: > On 01/10/2013 01:09 PM, Jaroslav Bachorik wrote: > >> On 01/10/2013 12:53 PM, shanliang wrote: >> >>> Instead to wait GC, you can also to wait the >>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive >>> it, then your listener must be removed too. Of course this solution is >>> >> The problem is that the *NotificationForwarder implementations swallow >> this kind of notification and just perform the cleanup. No other >> listener will ever receive this notification. >> >> The "unregisterMBean" operation's semantics is not clearly defined. >> Intuitively, when unregistering an MBean all the associated listeners >> should be gone before the method returns. But this is not the case - >> currently the listeners are sanitized some time after the >> "unregisterMBean" operation started, eventually. There is no easy way to >> notify the API user that the listeners were removed. I am afraid that in >> order to resolve these problems new APIs would need to be introduced and >> the whole mechanism of delivering notification should be revisited (as >> it was planned for JMX 2.0, anyway). >> >> As for fixing the test - checking the weak references works fine as well >> as increasing the timeout. They both can fail when the system is >> extremely busy but the GC based solution will be in general faster than >> the one with increased timeout. >> > > Updated webrev: http://cr.openjdk.java.net/~jbachorik/7170447/webrev.01 > > >> -JB- >> >> >>> implementation dependent, but the test is already implementation dependent. >>> >>> Shanliang >>> >>> >>> Jaroslav Bachorik wrote: >>> >>>> On 01/10/2013 10:05 AM, shanliang wrote: >>>> >>>> >>>>> Jaroslav Bachorik wrote: >>>>> >>>>> >>>>>> On 01/09/2013 03:25 PM, shanliang wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Jaroslav Bachorik wrote: >>>>>>> >>>>>>> >>>>>>>> On 01/09/2013 02:44 PM, shanliang wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Let's forget the JMX implementation at first. If an MBean is >>>>>>>>> unregistered, a user at client side calls >>>>>>>>> "removeNotificationListener" >>>>>>>>> on the MBean, what should happen? if the user calls >>>>>>>>> "isRegistered" on >>>>>>>>> the MBean, what should happen? >>>>>>>>> >>>>>>>>> I have done 2 tests, I used only one thread: >>>>>>>>> >>>>>>>>> 1) >>>>>>>>> ...... >>>>>>>>> localServer.unregisterMBean(myMBean); >>>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean)); >>>>>>>>> >>>>>>>>> I got isRegistered = false; >>>>>>>>> >>>>>>>>> 2) >>>>>>>>> ...... >>>>>>>>> localServer.unregisterMBean(myMBean); >>>>>>>>> System.out.println("isRegistered = >>>>>>>>> "+remoteClientServer.sRegistered(myMBean)); >>>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener); >>>>>>>>> >>>>>>>>> I did not get an exception. >>>>>>>>> >>>>>>>>> The 1) told that the client could know the MBean was unregistered, >>>>>>>>> then >>>>>>>>> the client should throw an exception for the call of >>>>>>>>> "removeNotificationListener" in 2). >>>>>>>>> >>>>>>>>> >>>>>>>> Yes, but then it would not test the listener leakage as it was >>>>>>>> supposed >>>>>>>> to test but rather the fact that the client throws the appropriate >>>>>>>> exception. The fact that the mbean was unregistered does not >>>>>>>> necessarily >>>>>>>> mean that the listeners were released. The main problem remains - the >>>>>>>> listeners are being cleaned-up asynchronously and the clean-up >>>>>>>> process >>>>>>>> might race against the other uses of the JMX API. >>>>>>>> >>>>>>>> >>>>>>> client.removeNotificationListener is not a right way here to test >>>>>>> listener leak, we could use some other ways, for example we keep the >>>>>>> listener in a weak reference, then after the mbean is removed, the >>>>>>> weak >>>>>>> reference should be empty after some time. Another way is like >>>>>>> DeadListenerTest does to check whether clean has done at server side: >>>>>>> use reflection to get the "listenerMap" at server side and make >>>>>>> sure it >>>>>>> is empty, but this need to add a private method to the class >>>>>>> ClientNotifForwarder. >>>>>>> >>>>>>> >>>>>> There will still be problems with timing. You need either to wait for >>>>>> the GC to kick in to clean up the weak ref. And the listenerMap will >>>>>> not >>>>>> be purged of the unregistered MBean listeners until the notification is >>>>>> generated, processed on the ClientNotificationForwarder and >>>>>> forwarded to >>>>>> the server. So there goes the timing issue again. >>>>>> >>>>>> The problem is that the "unregisterMBean" operation does not guarantee >>>>>> that the listeners have been unregistered at the time it returns. So, >>>>>> one way or the other we will need to wait an arbitrary amount of time >>>>>> before checking for the memory leak. >>>>>> >>>>>> >>>>> Yes we need to wait, but you can use a cycle like: >>>>> long maxWaitingTime = 3000; >>>>> long startTime = System.currentTimeMillis(); >>>>> while ( weakReference.get != null >>>>> && System.currentTimeMillis() < startTime + >>>>> maxWaitingTime) { >>>>> System.gc(); >>>>> Thread.sleep(100); >>>>> System.gc(); >>>>> } >>>>> >>>>> if (weakReference.get != null) { >>>>> // failed >>>>> } >>>>> >>>>> >>>> Still you need an arbitrary timeout which might be reached under extreme >>>> conditions making this test to fail intermittently. But I'd say that's >>>> the nature of tests for memory leak fixes, due to the unpredictable >>>> nature of the GC runs. Unless you take a heap dump and do a reachability >>>> analysis you can not be sure whether a reference is dangling somwehwere >>>> or it just hasn't been collected yet :/ >>>> >>>> -JB- >>>> >>>> >>>> >>>>> Shanliang >>>>> >>>>> >>>>>> -JB- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> I think we have 3 things to do here: >>>>>>> 1) modify the test to not use removeNotificationListener for testing >>>>>>> listener leak >>>>>>> 2) create a new bug about a client does not throw an exception >>>>>>> after an >>>>>>> mbean is unregistered >>>>>>> 3) create a bug about a client does not throw a same exception as at >>>>>>> server side. >>>>>>> >>>>>>> I will do 2) and 3), if you like you can continue 1), it might need to >>>>>>> do fix also in the JMX implementation. >>>>>>> >>>>>>> Shanliang >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> The test "DeadListenerTest" got passed in some machines because >>>>>>>>> of the >>>>>>>>> timeout for waiting a notification. I think its failure just tells >>>>>>>>> a new >>>>>>>>> bug. >>>>>>>>> >>>>>>>>> To set a longer timeout just hides the real bug, and the test might >>>>>>>>> fail >>>>>>>>> again one day if running condition is changed and you might need >>>>>>>>> longer >>>>>>>>> timeout again. >>>>>>>>> >>>>>>>>> >>>>>>>> Yes, I agree with you that extending the timeout just lessens the >>>>>>>> likelihood of the race condition and does not prevent it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Shanliang >>>>>>>>> >>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> I still have no idea why the test failed, but I do not see why a >>>>>>>>>>>>> longer >>>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and >>>>>>>>>>>>> tested >>>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real >>>>>>>>>>>>> problem. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and >>>>>>>>>>>> -Xcomp >>>>>>>>>>>> switch) and verify that the fix makes the test pass. >>>>>>>>>>>> >>>>>>>>>>>> The ClientNotifForwarder scans the notifications for >>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and >>>>>>>>>>>> removes the >>>>>>>>>>>> appropriate notification listeners in a separate thread. Thus, >>>>>>>>>>>> calling >>>>>>>>>>>> "removeNotificationListener" on the main thread is prone to >>>>>>>>>>>> racing. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> It is true that ClientNotifForwarder scans the notifications for >>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes >>>>>>>>>>> the >>>>>>>>>>> appropriate notification listeners in a separate thread. This is >>>>>>>>>>> for a >>>>>>>>>>> client connection to do clean if a user never calls >>>>>>>>>>> removeNotificationListener. >>>>>>>>>>> >>>>>>>>>>> But calling directly removeNotificationListener from a client >>>>>>>>>>> should >>>>>>>>>>> still get exception if the clean has not been done. As I said, if >>>>>>>>>>> the >>>>>>>>>>> client checked and found the listener was still there, then the >>>>>>>>>>> client >>>>>>>>>>> sent a request to its server to remove the listener at server >>>>>>>>>>> side, >>>>>>>>>>> the >>>>>>>>>>> server should find that the MBean in question was not registered, >>>>>>>>>>> so the >>>>>>>>>>> server should throw an exception. The bug might be here. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> This won't work. The server side listeners are removed upon >>>>>>>>>> receiving >>>>>>>>>> the "unregistered" notification which is delivered from the >>>>>>>>>> ClientNotificationForwarder and it may have not run yet (since it >>>>>>>>>> runs >>>>>>>>>> in a separate executor thread). The result is that the attempt to >>>>>>>>>> remove >>>>>>>>>> the notification listener on the server will succeed as well >>>>>>>>>> failing >>>>>>>>>> the >>>>>>>>>> test subsequently. >>>>>>>>>> >>>>>>>>>> -JB- >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Shanliang >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> The timeout you made longer was used to wait a notification >>>>>>>>>>>>> which >>>>>>>>>>>>> should >>>>>>>>>>>>> never arrive. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Well, it can be used to allow more time to process the >>>>>>>>>>>> "unregister" >>>>>>>>>>>> notification, too. >>>>>>>>>>>> >>>>>>>>>>>> When I think more of this I am more inclined to fix the race >>>>>>>>>>>> condition. >>>>>>>>>>>> An updated webrev will follow. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> To remove a listener from a client side, we did: >>>>>>>>>>>>> 1) at client side, check whether it was added in the client side >>>>>>>>>>>>> 2) at server side, check whether the MBean in question was >>>>>>>>>>>>> registered in >>>>>>>>>>>>> the MBeanServer (!!!) >>>>>>>>>>>>> 3) at server side, check whether the listener was added. >>>>>>>>>>>>> >>>>>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification. >>>>>>>>>>>>> Anyway, >>>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to >>>>>>>>>>>>> unregister an >>>>>>>>>>>>> mbean, then any following call (without any time break) to >>>>>>>>>>>>> use the >>>>>>>>>>>>> mbean >>>>>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered" >>>>>>>>>>>>> etc. >>>>>>>>>>>>> >>>>>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing >>>>>>>>>>>>> MBeam, >>>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get >>>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should >>>>>>>>>>>>> create a >>>>>>>>>>>>> bug, because both implemented the same interface >>>>>>>>>>>>> MBeanServerConnection. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Yes, it is rather inconsistent. >>>>>>>>>>>> >>>>>>>>>>>> -JB- >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Shanliang >>>>>>>>>>>>> >>>>>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Looking for review and a sponsor. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Webrev at >>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>>>>>>>>>> >>>>>>>>>>>>>> In this issue the timing is the problem. >>>>>>>>>>>>>> MBeanServer.unregisterMBean() >>>>>>>>>>>>>> fires the "unregister" notification which is sent to the server >>>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister" >>>>>>>>>>>>>> notification >>>>>>>>>>>>>> has not been yet processed at the time of invoking >>>>>>>>>>>>>> removeNotificationListener() and the notification listeners >>>>>>>>>>>>>> hasn't >>>>>>>>>>>>>> been >>>>>>>>>>>>>> cleaned up leading to the test failure. >>>>>>>>>>>>>> >>>>>>>>>>>>>> There is no synchronization between the client and the >>>>>>>>>>>>>> server and >>>>>>>>>>>>>> such >>>>>>>>>>>>>> race condition can occur occasionally. Normally, the >>>>>>>>>>>>>> execution is >>>>>>>>>>>>>> fast >>>>>>>>>>>>>> enough to behave like the "unregister" notification is >>>>>>>>>>>>>> processed >>>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems >>>>>>>>>>>>>> that >>>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains >>>>>>>>>>>>>> the CPU >>>>>>>>>>>>>> enough to make this problem appear. >>>>>>>>>>>>>> >>>>>>>>>>>>>> There is no proper fix for this - the only thing that work is >>>>>>>>>>>>>> waiting a >>>>>>>>>>>>>> bit longer in the main thread to give the notification >>>>>>>>>>>>>> processing >>>>>>>>>>>>>> thread >>>>>>>>>>>>>> some time to clean up the listeners. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> >>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130110/c9e6fdf9/attachment-0001.html From jaroslav.bachorik at oracle.com Thu Jan 10 05:49:09 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 10 Jan 2013 14:49:09 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50EEBFA8.5010001@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> <50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com> <50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com> <50EEA8F8.7090007@oracle.com> <50EEABA6.6010203@oracle.com> <50EEAF60.7040801@oracle.com> <50EEB4B1.8070101@oracle.com> <50EEBFA8.5010001@oracle.com> Message-ID: <50EEC6D5.2040400@oracle.com> On 01/10/2013 02:18 PM, shanliang wrote: > The weakListener is unnecessary, the test does already the same > verification: > 171 Set setForUnreg = listenerMap.get(name); > 172 assertTrue("No trace of unregistered MBean: " + setForUnreg, > setForUnreg == null); Addressed. > > All other are OK for me. So, http://cr.openjdk.java.net/~jbachorik/7170447/webrev.02 could be the final version. Thanks for the review! -JB- > > Shanliang > > > Jaroslav Bachorik wrote: >> On 01/10/2013 01:09 PM, Jaroslav Bachorik wrote: >> >>> On 01/10/2013 12:53 PM, shanliang wrote: >>> >>>> Instead to wait GC, you can also to wait the >>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive >>>> it, then your listener must be removed too. Of course this solution is >>>> >>> The problem is that the *NotificationForwarder implementations swallow >>> this kind of notification and just perform the cleanup. No other >>> listener will ever receive this notification. >>> >>> The "unregisterMBean" operation's semantics is not clearly defined. >>> Intuitively, when unregistering an MBean all the associated listeners >>> should be gone before the method returns. But this is not the case - >>> currently the listeners are sanitized some time after the >>> "unregisterMBean" operation started, eventually. There is no easy way to >>> notify the API user that the listeners were removed. I am afraid that in >>> order to resolve these problems new APIs would need to be introduced and >>> the whole mechanism of delivering notification should be revisited (as >>> it was planned for JMX 2.0, anyway). >>> >>> As for fixing the test - checking the weak references works fine as well >>> as increasing the timeout. They both can fail when the system is >>> extremely busy but the GC based solution will be in general faster than >>> the one with increased timeout. >>> >> >> Updated webrev: http://cr.openjdk.java.net/~jbachorik/7170447/webrev.01 >> >> >>> -JB- >>> >>> >>>> implementation dependent, but the test is already implementation >>>> dependent. >>>> >>>> Shanliang >>>> >>>> >>>> Jaroslav Bachorik wrote: >>>> >>>>> On 01/10/2013 10:05 AM, shanliang wrote: >>>>> >>>>> >>>>>> Jaroslav Bachorik wrote: >>>>>> >>>>>>> On 01/09/2013 03:25 PM, shanliang wrote: >>>>>>> >>>>>>> >>>>>>>> Jaroslav Bachorik wrote: >>>>>>>> >>>>>>>>> On 01/09/2013 02:44 PM, shanliang wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Let's forget the JMX implementation at first. If an MBean is >>>>>>>>>> unregistered, a user at client side calls >>>>>>>>>> "removeNotificationListener" >>>>>>>>>> on the MBean, what should happen? if the user calls >>>>>>>>>> "isRegistered" on >>>>>>>>>> the MBean, what should happen? >>>>>>>>>> >>>>>>>>>> I have done 2 tests, I used only one thread: >>>>>>>>>> >>>>>>>>>> 1) >>>>>>>>>> ...... >>>>>>>>>> localServer.unregisterMBean(myMBean); >>>>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean)); >>>>>>>>>> >>>>>>>>>> I got isRegistered = false; >>>>>>>>>> >>>>>>>>>> 2) >>>>>>>>>> ...... >>>>>>>>>> localServer.unregisterMBean(myMBean); >>>>>>>>>> System.out.println("isRegistered = >>>>>>>>>> "+remoteClientServer.sRegistered(myMBean)); >>>>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener); >>>>>>>>>> >>>>>>>>>> I did not get an exception. >>>>>>>>>> >>>>>>>>>> The 1) told that the client could know the MBean was >>>>>>>>>> unregistered, >>>>>>>>>> then >>>>>>>>>> the client should throw an exception for the call of >>>>>>>>>> "removeNotificationListener" in 2). >>>>>>>>>> >>>>>>>>> Yes, but then it would not test the listener leakage as it was >>>>>>>>> supposed >>>>>>>>> to test but rather the fact that the client throws the appropriate >>>>>>>>> exception. The fact that the mbean was unregistered does not >>>>>>>>> necessarily >>>>>>>>> mean that the listeners were released. The main problem remains >>>>>>>>> - the >>>>>>>>> listeners are being cleaned-up asynchronously and the clean-up >>>>>>>>> process >>>>>>>>> might race against the other uses of the JMX API. >>>>>>>>> >>>>>>>> client.removeNotificationListener is not a right way here to test >>>>>>>> listener leak, we could use some other ways, for example we keep >>>>>>>> the >>>>>>>> listener in a weak reference, then after the mbean is removed, the >>>>>>>> weak >>>>>>>> reference should be empty after some time. Another way is like >>>>>>>> DeadListenerTest does to check whether clean has done at server >>>>>>>> side: >>>>>>>> use reflection to get the "listenerMap" at server side and make >>>>>>>> sure it >>>>>>>> is empty, but this need to add a private method to the class >>>>>>>> ClientNotifForwarder. >>>>>>>> >>>>>>> There will still be problems with timing. You need either to wait >>>>>>> for >>>>>>> the GC to kick in to clean up the weak ref. And the listenerMap will >>>>>>> not >>>>>>> be purged of the unregistered MBean listeners until the >>>>>>> notification is >>>>>>> generated, processed on the ClientNotificationForwarder and >>>>>>> forwarded to >>>>>>> the server. So there goes the timing issue again. >>>>>>> >>>>>>> The problem is that the "unregisterMBean" operation does not >>>>>>> guarantee >>>>>>> that the listeners have been unregistered at the time it returns. >>>>>>> So, >>>>>>> one way or the other we will need to wait an arbitrary amount of >>>>>>> time >>>>>>> before checking for the memory leak. >>>>>>> >>>>>> Yes we need to wait, but you can use a cycle like: >>>>>> long maxWaitingTime = 3000; >>>>>> long startTime = System.currentTimeMillis(); >>>>>> while ( weakReference.get != null >>>>>> && System.currentTimeMillis() < startTime + >>>>>> maxWaitingTime) { >>>>>> System.gc(); >>>>>> Thread.sleep(100); >>>>>> System.gc(); >>>>>> } >>>>>> >>>>>> if (weakReference.get != null) { >>>>>> // failed >>>>>> } >>>>>> >>>>> Still you need an arbitrary timeout which might be reached under >>>>> extreme >>>>> conditions making this test to fail intermittently. But I'd say that's >>>>> the nature of tests for memory leak fixes, due to the unpredictable >>>>> nature of the GC runs. Unless you take a heap dump and do a >>>>> reachability >>>>> analysis you can not be sure whether a reference is dangling >>>>> somwehwere >>>>> or it just hasn't been collected yet :/ >>>>> >>>>> -JB- >>>>> >>>>> >>>>> >>>>>> Shanliang >>>>>> >>>>>>> -JB- >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I think we have 3 things to do here: >>>>>>>> 1) modify the test to not use removeNotificationListener for >>>>>>>> testing >>>>>>>> listener leak >>>>>>>> 2) create a new bug about a client does not throw an exception >>>>>>>> after an >>>>>>>> mbean is unregistered >>>>>>>> 3) create a bug about a client does not throw a same exception >>>>>>>> as at >>>>>>>> server side. >>>>>>>> >>>>>>>> I will do 2) and 3), if you like you can continue 1), it might >>>>>>>> need to >>>>>>>> do fix also in the JMX implementation. >>>>>>>> >>>>>>>> Shanliang >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> The test "DeadListenerTest" got passed in some machines because >>>>>>>>>> of the >>>>>>>>>> timeout for waiting a notification. I think its failure just >>>>>>>>>> tells >>>>>>>>>> a new >>>>>>>>>> bug. >>>>>>>>>> >>>>>>>>>> To set a longer timeout just hides the real bug, and the test >>>>>>>>>> might >>>>>>>>>> fail >>>>>>>>>> again one day if running condition is changed and you might need >>>>>>>>>> longer >>>>>>>>>> timeout again. >>>>>>>>>> >>>>>>>>> Yes, I agree with you that extending the timeout just lessens the >>>>>>>>> likelihood of the race condition and does not prevent it. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Shanliang >>>>>>>>>> >>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>> >>>>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> I still have no idea why the test failed, but I do not see >>>>>>>>>>>>>> why a >>>>>>>>>>>>>> longer >>>>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and >>>>>>>>>>>>>> tested >>>>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real >>>>>>>>>>>>>> problem. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and >>>>>>>>>>>>> -Xcomp >>>>>>>>>>>>> switch) and verify that the fix makes the test pass. >>>>>>>>>>>>> >>>>>>>>>>>>> The ClientNotifForwarder scans the notifications for >>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and >>>>>>>>>>>>> removes the >>>>>>>>>>>>> appropriate notification listeners in a separate thread. Thus, >>>>>>>>>>>>> calling >>>>>>>>>>>>> "removeNotificationListener" on the main thread is prone to >>>>>>>>>>>>> racing. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> It is true that ClientNotifForwarder scans the notifications >>>>>>>>>>>> for >>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes >>>>>>>>>>>> the >>>>>>>>>>>> appropriate notification listeners in a separate thread. >>>>>>>>>>>> This is >>>>>>>>>>>> for a >>>>>>>>>>>> client connection to do clean if a user never calls >>>>>>>>>>>> removeNotificationListener. >>>>>>>>>>>> >>>>>>>>>>>> But calling directly removeNotificationListener from a client >>>>>>>>>>>> should >>>>>>>>>>>> still get exception if the clean has not been done. As I >>>>>>>>>>>> said, if >>>>>>>>>>>> the >>>>>>>>>>>> client checked and found the listener was still there, then the >>>>>>>>>>>> client >>>>>>>>>>>> sent a request to its server to remove the listener at server >>>>>>>>>>>> side, >>>>>>>>>>>> the >>>>>>>>>>>> server should find that the MBean in question was not >>>>>>>>>>>> registered, >>>>>>>>>>>> so the >>>>>>>>>>>> server should throw an exception. The bug might be here. >>>>>>>>>>>> >>>>>>>>>>> This won't work. The server side listeners are removed upon >>>>>>>>>>> receiving >>>>>>>>>>> the "unregistered" notification which is delivered from the >>>>>>>>>>> ClientNotificationForwarder and it may have not run yet >>>>>>>>>>> (since it >>>>>>>>>>> runs >>>>>>>>>>> in a separate executor thread). The result is that the >>>>>>>>>>> attempt to >>>>>>>>>>> remove >>>>>>>>>>> the notification listener on the server will succeed as well >>>>>>>>>>> failing >>>>>>>>>>> the >>>>>>>>>>> test subsequently. >>>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Shanliang >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> The timeout you made longer was used to wait a notification >>>>>>>>>>>>>> which >>>>>>>>>>>>>> should >>>>>>>>>>>>>> never arrive. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Well, it can be used to allow more time to process the >>>>>>>>>>>>> "unregister" >>>>>>>>>>>>> notification, too. >>>>>>>>>>>>> >>>>>>>>>>>>> When I think more of this I am more inclined to fix the race >>>>>>>>>>>>> condition. >>>>>>>>>>>>> An updated webrev will follow. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> To remove a listener from a client side, we did: >>>>>>>>>>>>>> 1) at client side, check whether it was added in the >>>>>>>>>>>>>> client side >>>>>>>>>>>>>> 2) at server side, check whether the MBean in question was >>>>>>>>>>>>>> registered in >>>>>>>>>>>>>> the MBeanServer (!!!) >>>>>>>>>>>>>> 3) at server side, check whether the listener was added. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So 2) tells that we did not rely on a "unregister" >>>>>>>>>>>>>> notification. >>>>>>>>>>>>>> Anyway, >>>>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to >>>>>>>>>>>>>> unregister an >>>>>>>>>>>>>> mbean, then any following call (without any time break) to >>>>>>>>>>>>>> use the >>>>>>>>>>>>>> mbean >>>>>>>>>>>>>> should fail, like "removeNotificationListener", >>>>>>>>>>>>>> "isRegistered" >>>>>>>>>>>>>> etc. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I do see a bug here, if we remove a listener from a >>>>>>>>>>>>>> non-existing >>>>>>>>>>>>>> MBeam, >>>>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get >>>>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should >>>>>>>>>>>>>> create a >>>>>>>>>>>>>> bug, because both implemented the same interface >>>>>>>>>>>>>> MBeanServerConnection. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, it is rather inconsistent. >>>>>>>>>>>>> >>>>>>>>>>>>> -JB- >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Shanliang >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Looking for review and a sponsor. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Webrev at >>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In this issue the timing is the problem. >>>>>>>>>>>>>>> MBeanServer.unregisterMBean() >>>>>>>>>>>>>>> fires the "unregister" notification which is sent to the >>>>>>>>>>>>>>> server >>>>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister" >>>>>>>>>>>>>>> notification >>>>>>>>>>>>>>> has not been yet processed at the time of invoking >>>>>>>>>>>>>>> removeNotificationListener() and the notification listeners >>>>>>>>>>>>>>> hasn't >>>>>>>>>>>>>>> been >>>>>>>>>>>>>>> cleaned up leading to the test failure. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There is no synchronization between the client and the >>>>>>>>>>>>>>> server and >>>>>>>>>>>>>>> such >>>>>>>>>>>>>>> race condition can occur occasionally. Normally, the >>>>>>>>>>>>>>> execution is >>>>>>>>>>>>>>> fast >>>>>>>>>>>>>>> enough to behave like the "unregister" notification is >>>>>>>>>>>>>>> processed >>>>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it >>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option >>>>>>>>>>>>>>> strains >>>>>>>>>>>>>>> the CPU >>>>>>>>>>>>>>> enough to make this problem appear. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There is no proper fix for this - the only thing that >>>>>>>>>>>>>>> work is >>>>>>>>>>>>>>> waiting a >>>>>>>>>>>>>>> bit longer in the main thread to give the notification >>>>>>>>>>>>>>> processing >>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>> some time to clean up the listeners. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> >> > > From shanliang.jiang at oracle.com Thu Jan 10 07:14:40 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Thu, 10 Jan 2013 16:14:40 +0100 Subject: jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure In-Reply-To: <50EEC6D5.2040400@oracle.com> References: <50EC3850.7080508@oracle.com> <50ED2D0A.5000509@oracle.com> <50ED3C4F.1070001@oracle.com> <50ED41AC.4010007@oracle.com> <50ED6D8E.6070404@oracle.com> <50ED7436.1020205@oracle.com> <50ED7801.8080704@oracle.com> <50ED7DEF.9020108@oracle.com> <50EE824A.8020106@oracle.com> <50EE8447.50901@oracle.com> <50EEA8F8.7090007@oracle.com> <50EEABA6.6010203@oracle.com> <50EEAF60.7040801@oracle.com> <50EEB4B1.8070101@oracle.com> <50EEBFA8.5010001@oracle.com> <50EEC6D5.2040400@oracle.com> Message-ID: <50EEDAE0.7040904@oracle.com> It is OK for me, thanks for fixing the bug! Shanliang Jaroslav Bachorik wrote: > On 01/10/2013 02:18 PM, shanliang wrote: > >> The weakListener is unnecessary, the test does already the same >> verification: >> 171 Set setForUnreg = listenerMap.get(name); >> 172 assertTrue("No trace of unregistered MBean: " + setForUnreg, >> setForUnreg == null); >> > > Addressed. > > >> All other are OK for me. >> > > So, http://cr.openjdk.java.net/~jbachorik/7170447/webrev.02 could be the > final version. > > Thanks for the review! > > -JB- > > >> Shanliang >> >> >> Jaroslav Bachorik wrote: >> >>> On 01/10/2013 01:09 PM, Jaroslav Bachorik wrote: >>> >>> >>>> On 01/10/2013 12:53 PM, shanliang wrote: >>>> >>>> >>>>> Instead to wait GC, you can also to wait the >>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive >>>>> it, then your listener must be removed too. Of course this solution is >>>>> >>>>> >>>> The problem is that the *NotificationForwarder implementations swallow >>>> this kind of notification and just perform the cleanup. No other >>>> listener will ever receive this notification. >>>> >>>> The "unregisterMBean" operation's semantics is not clearly defined. >>>> Intuitively, when unregistering an MBean all the associated listeners >>>> should be gone before the method returns. But this is not the case - >>>> currently the listeners are sanitized some time after the >>>> "unregisterMBean" operation started, eventually. There is no easy way to >>>> notify the API user that the listeners were removed. I am afraid that in >>>> order to resolve these problems new APIs would need to be introduced and >>>> the whole mechanism of delivering notification should be revisited (as >>>> it was planned for JMX 2.0, anyway). >>>> >>>> As for fixing the test - checking the weak references works fine as well >>>> as increasing the timeout. They both can fail when the system is >>>> extremely busy but the GC based solution will be in general faster than >>>> the one with increased timeout. >>>> >>>> >>> Updated webrev: http://cr.openjdk.java.net/~jbachorik/7170447/webrev.01 >>> >>> >>> >>>> -JB- >>>> >>>> >>>> >>>>> implementation dependent, but the test is already implementation >>>>> dependent. >>>>> >>>>> Shanliang >>>>> >>>>> >>>>> Jaroslav Bachorik wrote: >>>>> >>>>> >>>>>> On 01/10/2013 10:05 AM, shanliang wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Jaroslav Bachorik wrote: >>>>>>> >>>>>>> >>>>>>>> On 01/09/2013 03:25 PM, shanliang wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 01/09/2013 02:44 PM, shanliang wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Let's forget the JMX implementation at first. If an MBean is >>>>>>>>>>> unregistered, a user at client side calls >>>>>>>>>>> "removeNotificationListener" >>>>>>>>>>> on the MBean, what should happen? if the user calls >>>>>>>>>>> "isRegistered" on >>>>>>>>>>> the MBean, what should happen? >>>>>>>>>>> >>>>>>>>>>> I have done 2 tests, I used only one thread: >>>>>>>>>>> >>>>>>>>>>> 1) >>>>>>>>>>> ...... >>>>>>>>>>> localServer.unregisterMBean(myMBean); >>>>>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean)); >>>>>>>>>>> >>>>>>>>>>> I got isRegistered = false; >>>>>>>>>>> >>>>>>>>>>> 2) >>>>>>>>>>> ...... >>>>>>>>>>> localServer.unregisterMBean(myMBean); >>>>>>>>>>> System.out.println("isRegistered = >>>>>>>>>>> "+remoteClientServer.sRegistered(myMBean)); >>>>>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener); >>>>>>>>>>> >>>>>>>>>>> I did not get an exception. >>>>>>>>>>> >>>>>>>>>>> The 1) told that the client could know the MBean was >>>>>>>>>>> unregistered, >>>>>>>>>>> then >>>>>>>>>>> the client should throw an exception for the call of >>>>>>>>>>> "removeNotificationListener" in 2). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Yes, but then it would not test the listener leakage as it was >>>>>>>>>> supposed >>>>>>>>>> to test but rather the fact that the client throws the appropriate >>>>>>>>>> exception. The fact that the mbean was unregistered does not >>>>>>>>>> necessarily >>>>>>>>>> mean that the listeners were released. The main problem remains >>>>>>>>>> - the >>>>>>>>>> listeners are being cleaned-up asynchronously and the clean-up >>>>>>>>>> process >>>>>>>>>> might race against the other uses of the JMX API. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> client.removeNotificationListener is not a right way here to test >>>>>>>>> listener leak, we could use some other ways, for example we keep >>>>>>>>> the >>>>>>>>> listener in a weak reference, then after the mbean is removed, the >>>>>>>>> weak >>>>>>>>> reference should be empty after some time. Another way is like >>>>>>>>> DeadListenerTest does to check whether clean has done at server >>>>>>>>> side: >>>>>>>>> use reflection to get the "listenerMap" at server side and make >>>>>>>>> sure it >>>>>>>>> is empty, but this need to add a private method to the class >>>>>>>>> ClientNotifForwarder. >>>>>>>>> >>>>>>>>> >>>>>>>> There will still be problems with timing. You need either to wait >>>>>>>> for >>>>>>>> the GC to kick in to clean up the weak ref. And the listenerMap will >>>>>>>> not >>>>>>>> be purged of the unregistered MBean listeners until the >>>>>>>> notification is >>>>>>>> generated, processed on the ClientNotificationForwarder and >>>>>>>> forwarded to >>>>>>>> the server. So there goes the timing issue again. >>>>>>>> >>>>>>>> The problem is that the "unregisterMBean" operation does not >>>>>>>> guarantee >>>>>>>> that the listeners have been unregistered at the time it returns. >>>>>>>> So, >>>>>>>> one way or the other we will need to wait an arbitrary amount of >>>>>>>> time >>>>>>>> before checking for the memory leak. >>>>>>>> >>>>>>>> >>>>>>> Yes we need to wait, but you can use a cycle like: >>>>>>> long maxWaitingTime = 3000; >>>>>>> long startTime = System.currentTimeMillis(); >>>>>>> while ( weakReference.get != null >>>>>>> && System.currentTimeMillis() < startTime + >>>>>>> maxWaitingTime) { >>>>>>> System.gc(); >>>>>>> Thread.sleep(100); >>>>>>> System.gc(); >>>>>>> } >>>>>>> >>>>>>> if (weakReference.get != null) { >>>>>>> // failed >>>>>>> } >>>>>>> >>>>>>> >>>>>> Still you need an arbitrary timeout which might be reached under >>>>>> extreme >>>>>> conditions making this test to fail intermittently. But I'd say that's >>>>>> the nature of tests for memory leak fixes, due to the unpredictable >>>>>> nature of the GC runs. Unless you take a heap dump and do a >>>>>> reachability >>>>>> analysis you can not be sure whether a reference is dangling >>>>>> somwehwere >>>>>> or it just hasn't been collected yet :/ >>>>>> >>>>>> -JB- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Shanliang >>>>>>> >>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I think we have 3 things to do here: >>>>>>>>> 1) modify the test to not use removeNotificationListener for >>>>>>>>> testing >>>>>>>>> listener leak >>>>>>>>> 2) create a new bug about a client does not throw an exception >>>>>>>>> after an >>>>>>>>> mbean is unregistered >>>>>>>>> 3) create a bug about a client does not throw a same exception >>>>>>>>> as at >>>>>>>>> server side. >>>>>>>>> >>>>>>>>> I will do 2) and 3), if you like you can continue 1), it might >>>>>>>>> need to >>>>>>>>> do fix also in the JMX implementation. >>>>>>>>> >>>>>>>>> Shanliang >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> The test "DeadListenerTest" got passed in some machines because >>>>>>>>>>> of the >>>>>>>>>>> timeout for waiting a notification. I think its failure just >>>>>>>>>>> tells >>>>>>>>>>> a new >>>>>>>>>>> bug. >>>>>>>>>>> >>>>>>>>>>> To set a longer timeout just hides the real bug, and the test >>>>>>>>>>> might >>>>>>>>>>> fail >>>>>>>>>>> again one day if running condition is changed and you might need >>>>>>>>>>> longer >>>>>>>>>>> timeout again. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Yes, I agree with you that extending the timeout just lessens the >>>>>>>>>> likelihood of the race condition and does not prevent it. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Shanliang >>>>>>>>>>> >>>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I still have no idea why the test failed, but I do not see >>>>>>>>>>>>>>> why a >>>>>>>>>>>>>>> longer >>>>>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and >>>>>>>>>>>>>>> tested >>>>>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real >>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and >>>>>>>>>>>>>> -Xcomp >>>>>>>>>>>>>> switch) and verify that the fix makes the test pass. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The ClientNotifForwarder scans the notifications for >>>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and >>>>>>>>>>>>>> removes the >>>>>>>>>>>>>> appropriate notification listeners in a separate thread. Thus, >>>>>>>>>>>>>> calling >>>>>>>>>>>>>> "removeNotificationListener" on the main thread is prone to >>>>>>>>>>>>>> racing. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> It is true that ClientNotifForwarder scans the notifications >>>>>>>>>>>>> for >>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes >>>>>>>>>>>>> the >>>>>>>>>>>>> appropriate notification listeners in a separate thread. >>>>>>>>>>>>> This is >>>>>>>>>>>>> for a >>>>>>>>>>>>> client connection to do clean if a user never calls >>>>>>>>>>>>> removeNotificationListener. >>>>>>>>>>>>> >>>>>>>>>>>>> But calling directly removeNotificationListener from a client >>>>>>>>>>>>> should >>>>>>>>>>>>> still get exception if the clean has not been done. As I >>>>>>>>>>>>> said, if >>>>>>>>>>>>> the >>>>>>>>>>>>> client checked and found the listener was still there, then the >>>>>>>>>>>>> client >>>>>>>>>>>>> sent a request to its server to remove the listener at server >>>>>>>>>>>>> side, >>>>>>>>>>>>> the >>>>>>>>>>>>> server should find that the MBean in question was not >>>>>>>>>>>>> registered, >>>>>>>>>>>>> so the >>>>>>>>>>>>> server should throw an exception. The bug might be here. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> This won't work. The server side listeners are removed upon >>>>>>>>>>>> receiving >>>>>>>>>>>> the "unregistered" notification which is delivered from the >>>>>>>>>>>> ClientNotificationForwarder and it may have not run yet >>>>>>>>>>>> (since it >>>>>>>>>>>> runs >>>>>>>>>>>> in a separate executor thread). The result is that the >>>>>>>>>>>> attempt to >>>>>>>>>>>> remove >>>>>>>>>>>> the notification listener on the server will succeed as well >>>>>>>>>>>> failing >>>>>>>>>>>> the >>>>>>>>>>>> test subsequently. >>>>>>>>>>>> >>>>>>>>>>>> -JB- >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Shanliang >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The timeout you made longer was used to wait a notification >>>>>>>>>>>>>>> which >>>>>>>>>>>>>>> should >>>>>>>>>>>>>>> never arrive. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Well, it can be used to allow more time to process the >>>>>>>>>>>>>> "unregister" >>>>>>>>>>>>>> notification, too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> When I think more of this I am more inclined to fix the race >>>>>>>>>>>>>> condition. >>>>>>>>>>>>>> An updated webrev will follow. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> To remove a listener from a client side, we did: >>>>>>>>>>>>>>> 1) at client side, check whether it was added in the >>>>>>>>>>>>>>> client side >>>>>>>>>>>>>>> 2) at server side, check whether the MBean in question was >>>>>>>>>>>>>>> registered in >>>>>>>>>>>>>>> the MBeanServer (!!!) >>>>>>>>>>>>>>> 3) at server side, check whether the listener was added. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So 2) tells that we did not rely on a "unregister" >>>>>>>>>>>>>>> notification. >>>>>>>>>>>>>>> Anyway, >>>>>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to >>>>>>>>>>>>>>> unregister an >>>>>>>>>>>>>>> mbean, then any following call (without any time break) to >>>>>>>>>>>>>>> use the >>>>>>>>>>>>>>> mbean >>>>>>>>>>>>>>> should fail, like "removeNotificationListener", >>>>>>>>>>>>>>> "isRegistered" >>>>>>>>>>>>>>> etc. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I do see a bug here, if we remove a listener from a >>>>>>>>>>>>>>> non-existing >>>>>>>>>>>>>>> MBeam, >>>>>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get >>>>>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should >>>>>>>>>>>>>>> create a >>>>>>>>>>>>>>> bug, because both implemented the same interface >>>>>>>>>>>>>>> MBeanServerConnection. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, it is rather inconsistent. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Shanliang >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Looking for review and a sponsor. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Webrev at >>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In this issue the timing is the problem. >>>>>>>>>>>>>>>> MBeanServer.unregisterMBean() >>>>>>>>>>>>>>>> fires the "unregister" notification which is sent to the >>>>>>>>>>>>>>>> server >>>>>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister" >>>>>>>>>>>>>>>> notification >>>>>>>>>>>>>>>> has not been yet processed at the time of invoking >>>>>>>>>>>>>>>> removeNotificationListener() and the notification listeners >>>>>>>>>>>>>>>> hasn't >>>>>>>>>>>>>>>> been >>>>>>>>>>>>>>>> cleaned up leading to the test failure. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> There is no synchronization between the client and the >>>>>>>>>>>>>>>> server and >>>>>>>>>>>>>>>> such >>>>>>>>>>>>>>>> race condition can occur occasionally. Normally, the >>>>>>>>>>>>>>>> execution is >>>>>>>>>>>>>>>> fast >>>>>>>>>>>>>>>> enough to behave like the "unregister" notification is >>>>>>>>>>>>>>>> processed >>>>>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it >>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option >>>>>>>>>>>>>>>> strains >>>>>>>>>>>>>>>> the CPU >>>>>>>>>>>>>>>> enough to make this problem appear. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> There is no proper fix for this - the only thing that >>>>>>>>>>>>>>>> work is >>>>>>>>>>>>>>>> waiting a >>>>>>>>>>>>>>>> bit longer in the main thread to give the notification >>>>>>>>>>>>>>>> processing >>>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>>> some time to clean up the listeners. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130110/9248c139/attachment-0001.html From jaroslav.bachorik at oracle.com Thu Jan 10 07:20:03 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 10 Jan 2013 16:20:03 +0100 Subject: jmx-dev [PATCH] JDK-8005472: com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh failed on windows In-Reply-To: <50EE813A.1020501@oracle.com> References: <50E16BA8.40203@oracle.com> <682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com> <50EAB014.30805@oracle.com> <50EE813A.1020501@oracle.com> Message-ID: <50EEDC23.5080005@oracle.com> Update: http://cr.openjdk.java.net/~jbachorik/8005472/webrev.04 On 01/10/2013 09:52 AM, Stuart Marks wrote: > On 1/7/13 3:23 AM, Jaroslav Bachorik wrote: >> On 01/04/2013 11:37 PM, Kelly O'Hair wrote: >>> I suspect it is not hanging because it does not exist, but that some >>> other windows process has it's hands on it. >>> This is the stdout file from the server being started up right? >>> Could the server from a previous test run be still running? >> >> Exactly. Amy confirmed this and provided a patch which resolves the >> hanging problem. >> >> The update patch is at >> http://cr.openjdk.java.net/~jbachorik/8005472/webrev.01 > > Hi Jaroslav, > > The change to remove the parentheses from around the server program > looks right. It avoids forking an extra process (at least in some > shells) and lets $! refer to the actual JVM, not an intermediate shell > process. The rm -f from Kelly's suggestion is good too. > > But there are other things wrong with the script. I don't think they > could cause hanging, but they could cause the script to fail in > unforeseen ways, or even to report success incorrectly. > > One problem is introduced by the change, where the Server's stderr is > also redirected into $URL_PATH along with stdout. This means that if the > Server program reports any errors, they'll get mixed into the URL_PATH > file instead of appearing in the test log. The URL_PATH file's contents > is never reported, so these error messages will be invisible. Fixed, only the stdout is redirected to $URL_PATH. > > The exit status of some of the critical commands (such as the > compilations) isn't checked, so if javac fails for some reason, the test > might not report failure. Instead, some weird error might or might not > be reported later (though one will still see the javac errors in the log). Fixed, introduced the check. The "set -e" was hanging the script so I have to check for the exit status manually. > > I don't think the sleep at line 80 is necessary, since the client runs > synchronously and should have exited by this point. And it's gone. > > The wait loop checking for the existence of the URL_PATH file doesn't > actually guarantee that the server is running or has initialized yet. > The file is actually created by the shell before the Server JVM starts > up. Thus, runClient might try to read from it before the server has > written anything to it. Or, as mentioned above, the server might have > written some error messages into the URL_PATH file instead of the > expected contents. Thus, the contents of the JMXURL variable can quite > possibly be incorrect. The err is not redirected to the file. A separate file is used to signal the availability of the server and that file is created from the java code after the server has been started. Also, the err and out streams are flushed to make sure the JMX URL makes it into the file. > > If this occurs, what will happen when the client runs? It may emit some > error message, and this will be filtered out by the grep pipeline. Thus, > HAS_ERRORS might end up empty, and the test will report passing, even > though everything has failed! Shouldn't happen with only the controlled stdout redirected to the file. > > For this changeset I'd recommend at a minimum removing the redirection > of stderr to URL_PATH. If the server fails we'll at least see errors in > the test log. > > For checking the notification message, is there a way to modify the > client to report an exit status or throw an exception? Throwing an > exception from main() will exit the JVM with a nonzero status, so this > can be checked more easily from the script. I think this is less > error-prone than grepping the output for a specific error message. The > test should fail if there is *any* error; it should not succeed if an > expected error is absent. This is unfortunately not possible. The notification processing needs to be robust enough to prevent exiting JVM in cases like this. Therefore it only reports the problem, dumps the notification and carries on. The only place one can find something went wrong is the err stream. > > You might consider having jtreg build the client and server classes. > This might simplify some of the setup. Also, jtreg is meticulous about > aborting the test if any compilations fail, so it takes care of that for > you. I need same name classes with incompatible code compiled to two different locations - client and server. I was not able to figure out how to use jtreg to accomplish that. -JB- > > It would be nice if there were a better way to have the client > rendezvous with the server. I hate to suggest it, but sleeping > unconditionally after starting the server is probably necessary. > Anything more robust probably requires rearchitecting the test, though. > > Sorry to dump all this on you. But one of the shell-based RMI tests > suffers from *exactly* the same pathologies. (I have yet to fix it.) > Unfortunately, I believe that there are a lot of other shell-based tests > in the test suite that have similar problems. The lesson here is that > writing reliable shell tests is a lot harder than it seems. > > Thanks, > > s'marks From stuart.marks at oracle.com Thu Jan 10 13:44:02 2013 From: stuart.marks at oracle.com (Stuart Marks) Date: Thu, 10 Jan 2013 13:44:02 -0800 Subject: jmx-dev [PATCH] JDK-8005472: com/sun/jmx/remote/NotificationMarshalVersions/TestSerializationMismatch.sh failed on windows In-Reply-To: <50EEDC23.5080005@oracle.com> References: <50E16BA8.40203@oracle.com> <682D734D-2021-48DE-844D-C55A52D27EBD@oracle.com> <50EAB014.30805@oracle.com> <50EE813A.1020501@oracle.com> <50EEDC23.5080005@oracle.com> Message-ID: <50EF3622.9050500@oracle.com> On 1/10/13 7:20 AM, Jaroslav Bachorik wrote: > Update: http://cr.openjdk.java.net/~jbachorik/8005472/webrev.04 Thanks for the update. Note, argv[0] is used before argv.length is checked, so if no args are passed this gives index out of bounds instead of the usage message. I see you take pains to write and flush the URL to stdout before writing the signaling file. Good. The obvious alternative (which I started writing but then erased) is just to put the URL into the signaling file. But this has a race between creation of the file and the writing of its contents. So, what you have works. (This kind of rendezvous problem occurs a lot; it seems like there ought to be a simpler way.) I suspect the -e option caused hangs because if something failed, it would leave the server running, spoiling the next test run. The usual way to deal with this is to use the shell 'trap' statement, to kill subprocesses and remove temp files before exiting the shell. Probably a good practice in general, but perhaps too much shell hackery for this change. (Up to you if you want to tackle it.) Regarding how the test is detecting success/failure, the concern is that if the client fails for some reason other than the failure being checked for, the test will still report passing. Since the error message is coming out of the client JVM, in principle it ought to be possible to redirect it somehow in order to do the assertion checking in Java. With the current shell scheme, not only are other failures reported as the test passing, these other failures are erased in the grep pipeline, so they're not even visible in the test log. This last issue is rather far afield from this webrev, and fixing it will probably require some rearchitecting of the test. So maybe it should be considered independently. I just happened to notice this going on, and I noticed the similarity to what's going on in the RMI tests. s'marks > On 01/10/2013 09:52 AM, Stuart Marks wrote: >> On 1/7/13 3:23 AM, Jaroslav Bachorik wrote: >>> On 01/04/2013 11:37 PM, Kelly O'Hair wrote: >>>> I suspect it is not hanging because it does not exist, but that some >>>> other windows process has it's hands on it. >>>> This is the stdout file from the server being started up right? >>>> Could the server from a previous test run be still running? >>> >>> Exactly. Amy confirmed this and provided a patch which resolves the >>> hanging problem. >>> >>> The update patch is at >>> http://cr.openjdk.java.net/~jbachorik/8005472/webrev.01 >> >> Hi Jaroslav, >> >> The change to remove the parentheses from around the server program >> looks right. It avoids forking an extra process (at least in some >> shells) and lets $! refer to the actual JVM, not an intermediate shell >> process. The rm -f from Kelly's suggestion is good too. >> >> But there are other things wrong with the script. I don't think they >> could cause hanging, but they could cause the script to fail in >> unforeseen ways, or even to report success incorrectly. >> >> One problem is introduced by the change, where the Server's stderr is >> also redirected into $URL_PATH along with stdout. This means that if the >> Server program reports any errors, they'll get mixed into the URL_PATH >> file instead of appearing in the test log. The URL_PATH file's contents >> is never reported, so these error messages will be invisible. > > Fixed, only the stdout is redirected to $URL_PATH. > >> >> The exit status of some of the critical commands (such as the >> compilations) isn't checked, so if javac fails for some reason, the test >> might not report failure. Instead, some weird error might or might not >> be reported later (though one will still see the javac errors in the log). > > Fixed, introduced the check. The "set -e" was hanging the script so I > have to check for the exit status manually. > >> >> I don't think the sleep at line 80 is necessary, since the client runs >> synchronously and should have exited by this point. > > And it's gone. > >> >> The wait loop checking for the existence of the URL_PATH file doesn't >> actually guarantee that the server is running or has initialized yet. >> The file is actually created by the shell before the Server JVM starts >> up. Thus, runClient might try to read from it before the server has >> written anything to it. Or, as mentioned above, the server might have >> written some error messages into the URL_PATH file instead of the >> expected contents. Thus, the contents of the JMXURL variable can quite >> possibly be incorrect. > > The err is not redirected to the file. A separate file is used to signal > the availability of the server and that file is created from the java > code after the server has been started. Also, the err and out streams > are flushed to make sure the JMX URL makes it into the file. > >> >> If this occurs, what will happen when the client runs? It may emit some >> error message, and this will be filtered out by the grep pipeline. Thus, >> HAS_ERRORS might end up empty, and the test will report passing, even >> though everything has failed! > > Shouldn't happen with only the controlled stdout redirected to the file. > >> >> For this changeset I'd recommend at a minimum removing the redirection >> of stderr to URL_PATH. If the server fails we'll at least see errors in >> the test log. >> >> For checking the notification message, is there a way to modify the >> client to report an exit status or throw an exception? Throwing an >> exception from main() will exit the JVM with a nonzero status, so this >> can be checked more easily from the script. I think this is less >> error-prone than grepping the output for a specific error message. The >> test should fail if there is *any* error; it should not succeed if an >> expected error is absent. > > This is unfortunately not possible. The notification processing needs to > be robust enough to prevent exiting JVM in cases like this. Therefore it > only reports the problem, dumps the notification and carries on. The > only place one can find something went wrong is the err stream. > >> >> You might consider having jtreg build the client and server classes. >> This might simplify some of the setup. Also, jtreg is meticulous about >> aborting the test if any compilations fail, so it takes care of that for >> you. > > I need same name classes with incompatible code compiled to two > different locations - client and server. I was not able to figure out > how to use jtreg to accomplish that. > > -JB- > >> >> It would be nice if there were a better way to have the client >> rendezvous with the server. I hate to suggest it, but sleeping >> unconditionally after starting the server is probably necessary. >> Anything more robust probably requires rearchitecting the test, though. >> >> Sorry to dump all this on you. But one of the shell-based RMI tests >> suffers from *exactly* the same pathologies. (I have yet to fix it.) >> Unfortunately, I believe that there are a lot of other shell-based tests >> in the test suite that have similar problems. The lesson here is that >> writing reliable shell tests is a lot harder than it seems. >> >> Thanks, >> >> s'marks > From Alan.Bateman at oracle.com Tue Jan 15 06:34:30 2013 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Tue, 15 Jan 2013 14:34:30 +0000 Subject: jmx-dev Update MXBeans to allow for the possibility that ConstructorProperties is ignored? Message-ID: <50F568F6.8020708@oracle.com> With the Compact Profiles proposal [1], there will be a subset Profile of Java SE that has JMX but not java.beans. This creates a challenge for the MXBean spec where a constructor to reconstitute a type may be used if it has the java.beans.ConstructorProperties annotation. For code that is compiled against a compact profile ("javac -profile compact3" for example) then it's not an issue because using using this annotation will not compile. However if there is code using this annotation that is compiled against the full platform but run on a runtime that implements compact3 then the annotation will be ignored. I'm wondering whether to add a clarification to the MXBean on this. It would essentially amount updating the rules under "Reconstructing an instance of Java type J from a CompositeData" so that it's clear that rule 2 does apply when running on a subset Profile of Java SE. I'm looking for opinions on whether this is necessary or not. -Alan [1] http://openjdk.java.net/jeps/161 From Alan.Bateman at oracle.com Wed Jan 16 08:25:11 2013 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 16 Jan 2013 16:25:11 +0000 Subject: jmx-dev Update MXBeans to allow for the possibility that ConstructorProperties is ignored? In-Reply-To: <50F568F6.8020708@oracle.com> References: <50F568F6.8020708@oracle.com> Message-ID: <50F6D467.3080308@oracle.com> On 15/01/2013 14:34, Alan Bateman wrote: > : > > > I'm wondering whether to add a clarification to the MXBean on this. It > would essentially amount updating the rules under "Reconstructing an > instance of Java type J from a CompositeData" so that it's clear that > rule 2 does apply when running on a subset Profile of Java SE. Thinking more about it, I think it would be safer and clearer to add a clarification. Here is what I propose: "Rule 2 is not applicable to subset Profiles of Java SE that do not include the {@code java.beans} package. In that case it may not be possible to reconstruct an instance of J, or it may be reconstructed by the means defined by subsequent rules." Does that seem reasonable? -Alan. From Alan.Bateman at oracle.com Mon Jan 21 08:08:03 2013 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 21 Jan 2013 16:08:03 +0000 Subject: jmx-dev Update MXBeans to allow for the possibility that ConstructorProperties is ignored? In-Reply-To: <50F568F6.8020708@oracle.com> References: <50F568F6.8020708@oracle.com> Message-ID: <50FD67E3.8040609@oracle.com> I've put a webrev here with the proposed changes here: http://cr.openjdk.java.net/~alanb/8006524/webrev/ In summary, it makes it clear that @ConstructorProperties is not applicable when the runtime does not have this annotation. In the future then it might may be desirable to consider adding javax.management.ConstructorProperties and supporting both annotations. I don't propose to do this now because it would further consideration, including perhaps supporting both annotations in the java.beans persistence support. Thanks, Alan. From mandy.chung at oracle.com Wed Jan 23 23:12:07 2013 From: mandy.chung at oracle.com (Mandy Chung) Date: Wed, 23 Jan 2013 23:12:07 -0800 Subject: jmx-dev Update MXBeans to allow for the possibility that ConstructorProperties is ignored? In-Reply-To: <50FD67E3.8040609@oracle.com> References: <50F568F6.8020708@oracle.com> <50FD67E3.8040609@oracle.com> Message-ID: <5100DEC7.9050704@oracle.com> On 1/21/2013 8:08 AM, Alan Bateman wrote: > > I've put a webrev here with the proposed changes here: > > http://cr.openjdk.java.net/~alanb/8006524/webrev/ > This looks reasonable to me. > In summary, it makes it clear that @ConstructorProperties is not > applicable when the runtime does not have this annotation. In the > future then it might may be desirable to consider adding > javax.management.ConstructorProperties and supporting both > annotations. I don't propose to do this now because it would further > consideration, including perhaps supporting both annotations in the > java.beans persistence support. > I'm fine with the proposed spec change and look into the addition of javax.management.ConstructorProperties later. For now, to register such a MXBean on a runtime of compact3 profile (without java.beans), it will fail with NotCompliantMBeanException that helps diagnosing the problem (unless a type can be reconstructed via other rules). Mandy From Alan.Bateman at oracle.com Thu Jan 24 04:25:19 2013 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Thu, 24 Jan 2013 12:25:19 +0000 Subject: jmx-dev Update MXBeans to allow for the possibility that ConstructorProperties is ignored? In-Reply-To: <5100DEC7.9050704@oracle.com> References: <50F568F6.8020708@oracle.com> <50FD67E3.8040609@oracle.com> <5100DEC7.9050704@oracle.com> Message-ID: <5101282F.6050201@oracle.com> On 24/01/2013 07:12, Mandy Chung wrote: > > I'm fine with the proposed spec change and look into the addition of > javax.management.ConstructorProperties later. For now, to register > such a MXBean on a runtime of compact3 profile (without java.beans), > it will fail with NotCompliantMBeanException that helps diagnosing the > problem (unless a type can be reconstructed via other rules). Thanks for the review. This is really just a mismatch between the compile-time and runtime environments, it would be caught by compile-time if compiled with "javac -profile compact3". I guess the only genuine scenario where it might be an issue is where someone runs a static analyzer over some code and it doesn't see the dependency because it's an annotation. In that case, it would fail when attempting to register the object and I hope is wouldn't be too difficult to diagnose (way back, in preparation for this, I tweaked the "applicable" method so that the exception is clearer when the annotation is not available). I've pushed this change to get it out of the way. In the future then it does need exploring the implications of adding javax.management.ConstructorProperties. I think we would have an inconsistency if this were added without corresponding support in JavaBeans persistence. -Alan