jmx-dev Review request: 8049303: Transient network problems cause JMX thread to fail silenty
Jaroslav Bachorik
jaroslav.bachorik at oracle.com
Thu Aug 28 15:57:43 UTC 2014
I have taken over this issue from Poonam since she will be unavailable
for the next month or so.
Could I have reviews for this change:
Bug: https://bugs.openjdk.java.net/browse/JDK-8049303
Webrev: http://cr.openjdk.java.net/~jbachorik/8049303/webrev.00
Problem and fix:
By default the JMX client side notification fetch timeout
(jmx.remote.x.notification.fetch.timeout) is 1 minute and the default
server connection timeout (jmx.remote.x.server.connection.timeout) is 2
minutes.
If the client side connector thread makes a notification fetch request
to the server, but a transient network problem prevents the server
response from reaching the client, the client side connector will wait
for a response until the timeout period (1 minute) has expired before
throwing an IOException.
The client side RMIConnector implementation handles the IOException, by
re-checking the connection status to understand whether or not it is
broken. If the connection is not available at that moment, the connector
fails by re-throwing the initial IOException. The problem is that this
re-check of the connection passes because the server side of the
connection doesn't time out until 2 minutes has passed (by default), so
the NotifFetcher thread
dies without posting a failed notification, and the client application
does not get a chance to recover.
The fix is to forward the non connection-related exceptions on the JMX
client side instead of checking the connection status. The
connection-related exceptions will cause closing the session as an
unsuccessful connection check would have done.
Testing:
All the jdk_jmx and jdk_management regression tests passed.
All the related JCK tests passed.
The fix applies cleanly to 8u and 7u repos.
Thanks,
-JB-
More information about the serviceability-dev
mailing list