Review request: 8049303: Transient network problems cause JMX thread to fail silenty

Poonam Bajaj poonam.bajaj at oracle.com
Tue Aug 26 10:27:13 UTC 2014


Sending the review request to serviceability-dev list as well...
-----------

Could I have reviews for this change:

Bug: https://bugs.openjdk.java.net/browse/JDK-8049303
Webrev: http://cr.openjdk.java.net/~poonam/8049303/webrev.00/

Problem and fix:
By default the JMX client side notification fetch timeout 
(jmx.remote.x.notification.fetch.timeout) is 1 minute and the default 
server connection timeout (jmx.remote.x.server.connection.timeout) is 2 
minutes.

If the client side connector thread makes a notification fetch request 
to the server, but a transient network problem prevents the server 
response from reaching the client, the client side connector will wait 
for a response until the timeout period (1 minute) has expired before 
throwing an IOException.

The client side RMIConnector implementation handles the IOException, by 
re-checking the connection status to understand whether or not it is 
broken. If the connection is not available at that moment, the connector 
fails by re-throwing the initial IOException. The problem is that this 
re-check of the connection passes because the server side of the 
connection doesn't time out until 2 minutes has passed (by default), so 
the NotifFetcher thread
dies without posting a failed notification, and the client application 
does not get a chance to recover.

The fix is to forward the exception on the JMX client side before 
checking the connection status.

Testing:
All the jdk_jmx and jdk_management regression tests passed.

The fix applies cleanly to 8u and 7u repos.


Thanks,
Poonam


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20140826/30ac8dcd/attachment.html>


More information about the serviceability-dev mailing list