Adding SocketChannel toString to connection exception messages

Tue Jan 2 21:25:28 UTC 2018

On Dec 31, 2017, at 7:24 AM, Peter Levart <peter.levart at gmail.com> wrote:
> 
> Hi,
>> 
>> I believe there are concerns with too much information that can be considered "sensitive" (like host names and IP addresses) appearing in error messages due to them ending up in log files and bug reports.
>> 
>> David
> 
> For debugging purposes it might sometimes be enough to get just a hint about the actual address / port but not reveal it entirely. The person doing debugging probably knows more about the environment than an average person so the hint might give him enough information to discern the actual address / port. Exposing just the last octet of an IP address and the last digit of the port might do. For example:
> 
> java.net.ConnectException: Connection to X.X.X.205:XXX8 refused.
> 
> So Steven, I'm curious whether such hint would help in your case?

This would definitely be better than nothing!  But it's still difficult, for example a common allocation pattern for us would be to assign networks to availability zones:

10.0.1.0/24 10.0.2.0/24 10.0.3.0/24

then if you pick the same last number for a well known service, you might end up with instances at 10.0.1.2, 10.0.2.2, 10.0.3.2 -- so depending on which octets are obscured you may end up with no useful information.

The triggering incident for us was that one of our Amazon ELBs started responding incorrectly (blackholing data) --
so when you resolve "my-elb-1.amazonaws.amazon.com" you'd get three different IP addresses, and depending on which one
is picked for the connect operation, you'll get all data blackholed.

The author of the code in question obviously had not considered a failure mode where one of the "It Is In The Cloud So It Always Works" load balancer instances did not behave like the others -- so it did not emit any diagnostic information other than the hostname, and until we got a network operator in with WireShark it was quite difficult to determine that it was in fact exactly one instance of three failing.  The pain of using WireShark to diagnose what should have been a quick scan through the logs, coupled with the sheer impossibility of annotating every network call in every application across our organization, pushed me to open this discussion to do it in one central place :)

> An attacker that knows something about the environment could find out the missing pieces without such hints anyway (simply by scanning IPs / ports), so such partial information is not that sensitive nowadays.

I've never really understood the justification behind this rule -- we have a similar policy in our organization, all IP addresses must be stripped.  But knowing the layout of the network, I can tell you that if you are on the outside and have one of our 10.x.x.x IP addresses it gives you almost nothing (good luck routing to it) -- and if you are on the inside, it's not exactly difficult to enumerate these.  DNS contains every name / IP you might find and it'll happily divulge all that information to you, and failing that, nmap can map out the majority of the network fairly reliably in seconds...

I respect that the JVM has to be conservative, but there is a real usability problem here, and the security gain is a little questionable to me!

> Another idea: define a one way function that maps the IP:port pair into a value which is displayed in the exception message. For debugging purposes this might be enough since the one doing debugging might know the set of possible IP:port pairs in advance. He could then apply the function to each of them in turn and find out the matching pair.

This is interesting, but only if you can enumerate the IP:port that might match.  We have a Apache Mesos cluster with O(100) nodes, each of which is assigned a pool of 1000 ports to assign to applications.  Needing to enumerate a million possible hash matches could be a royal pain, and my understanding is that 100 nodes is not even near the size of the largest production clusters out there.  It's not quite to the level of a non-starter, but as those numbers scale it could easily become useless.

Some of the exceptions may not have a specific port (hostname resolution), may have port 0, or such a dynamically assigned port, so you'd have to guess these special cases too.

Additionally, with cloud providers, you may not understand the space of IPs that an Abstract Cloud Device like ELB might come from -- it's coming from an AWS pool that you have no visibility into.

> On Jan 2, 2018, at 8:35 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
> 
> On 29/12/2017 00:33, Steven Schlansker wrote:
>> Thanks for the discussion!
>> 
>> So, it sounds like amending the message by default is going to be a non-starter -- but at least adding the information otherwise might be possible.
>> 
> There are examples in other area where exceptions include detail information (for diagnostics purposes) when not running with a security manager. This may be something to look at here (and easy to try out too as it wouldn't require touching any native code either).

I like that!

For now, I will proceed with improving my prototype to follow this suggestion -- unless there is a security manager, the exceptions are annotated.

An alternate way to activate it, by configuring the platform logger e.g. "java.net.SocketException=TRACE", might be useful in case we want to allow end users to configure the feature explicitly and independent of security manager.

Unfortunately I don't see how I can avoid changing the native code -- the exception message is initialized in native code, and by the time we call back to Java code, the necessary information is not passed in to the SocketException subclass constructor.  So I may be misinterpreting your guidance here, but I'm not seeing it.

I think I will hold off on adding Java level fields to the exception types.  I expect if I do that, I will then have to get a spec change to add appropriate public API to make the data actually usable.  Since the goal here is for log diagnostics as opposed to more structured data at the Java level, I'll avoid it.  It also avoids any complications with regards to changing the serial format for such a common type.  But I do think it means the work has to be done in the native code.