jmx-dev Review request: 8049303: Transient network problems cause JMX thread to fail silenty

Daniel Fuchs daniel.fuchs at oracle.com
Fri Aug 29 09:25:51 UTC 2014


Hi Jaroslav,

I am not sure to understand how this solves the problem.
The old code first checked the connection, and if that failed,
sent the FAILED notification, closed the connector, and rethrew
the exception.

The new code directly throws the exception without
checking the connection, and therefore without closing
the connection and sending the FAILED notification.

So is the fix a change of behavior by which the RMIConnector
will - in some cases - not try to autoclose the connection but
instead simply wait for the caller to explicitely call close()?

I'd be interested to hear what Shanliang has to say...

best regards,

-- daniel


On 8/28/14 5:57 PM, Jaroslav Bachorik wrote:
> I have taken over this issue from Poonam since she will be unavailable
> for the next month or so.
>
> Could I have reviews for this change:
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8049303
> Webrev: http://cr.openjdk.java.net/~jbachorik/8049303/webrev.00
>
> Problem and fix:
> By default the JMX client side notification fetch timeout
> (jmx.remote.x.notification.fetch.timeout) is 1 minute and the default
> server connection timeout (jmx.remote.x.server.connection.timeout) is 2
> minutes.
>
> If the client side connector thread makes a notification fetch request
> to the server, but a transient network problem prevents the server
> response from reaching the client, the client side connector will wait
> for a response until the timeout period (1 minute) has expired before
> throwing an IOException.
>
> The client side RMIConnector implementation handles the IOException, by
> re-checking the connection status to understand whether or not it is
> broken. If the connection is not available at that moment, the connector
> fails by re-throwing the initial IOException. The problem is that this
> re-check of the connection passes because the server side of the
> connection doesn't time out until 2 minutes has passed (by default), so
> the NotifFetcher thread
> dies without posting a failed notification, and the client application
> does not get a chance to recover.
>
> The fix is to forward the non connection-related exceptions on the JMX
> client side instead of checking the connection status. The
> connection-related exceptions will cause closing the session as an
> unsuccessful connection check would have done.
>
> Testing:
> All the jdk_jmx and jdk_management regression tests passed.
> All the related JCK tests passed.
>
> The fix applies cleanly to 8u and 7u repos.
>
>
> Thanks,
> -JB-
>
>



More information about the serviceability-dev mailing list