Gluing together URL.equals
Peter Levart
peter.levart at gmail.com
Thu Jul 3 16:01:42 UTC 2014
Hi,
We know that URL.equals and hashCode are fundamentally broken. But
URL.equals is even more broken than hashCode. Nevertheless, URL.equals
is used explicitly in the following places in JDK:
java.security.CodeSource.matchLocation
java.security.CodeSource.equals
java.util.jar.JarVerifier.VerifierCodeSource.equals
javax.sql.rowset.serial.SerialDatalink.equals
java.lang.Package.isSealed
javax.swing.JEditorPane.setPage
javax.swing.text.html.FrameView.changedUpdate
sun.applet.AppletViewer.getApplet
sun.applet.AppletViewer.getApplets
And I'm not counting places where it might be used because URLs are
Objects (as keys in HashMaps, etc...)
I'd like to discuss one of URL.equals pitfalls that might be able to get
fixed and whether it is desirable to fix it.
javadoc: "The equals method implements an equivalence relation on
non-null object references:
...
It is consistent: for any non-null reference values x and y, multiple
invocations of x.equals(y) consistently return true or consistently
return false, provided no information used in equals comparisons on the
objects is modified."
URL url1 = new URL("http://alias1/");
URL url2 = new URL("http://alias2/");
boolean answer1 = url1.equals(url2);
...
boolean answer2 = url1.equals(url2);
Can it happen that answer1 != answer2 ?
Yes! Suppose that alias1 and alias2 are host names that resolve to the
same IP address. Normally, answer1 and answer2 would be "true". But only
if the name service that resolves the host names is up and running. If
it's not, then the answer is "false". Suppose that while obtaining
answer1 the DNS was restarting and while obtaining answer2 it was up and
running... Then answer1 would be "false" while answer2 would be "true".
The following URLStreamHandler method that is called for both URLs from
equals method is responsible for such unstable behaviour:
protected synchronized InetAddress getHostAddress(URL u) {
if (u.hostAddress != null)
return u.hostAddress;
String host = u.getHost();
if (host == null || host.equals("")) {
return null;
} else {
try {
u.hostAddress = InetAddress.getByName(host);
} catch (UnknownHostException ex) {
return null;
} catch (SecurityException se) {
return null;
}
}
return u.hostAddress;
}
As can be seen, the hostAddress is obtained by InetAddress.getByName()
and then cached on the URL.hostAddress field. Leaving aside the fact
that although this method is synchronized, caching of hostAddress is not
synchronized properly (more on that later), the problem is that negative
answer (UnknownHostException or SecurityException) is not cached.
UnknownHostException is not cached by InetAddress.getByName() by default
and SecurityException is dependent on the caller SecurityContext. Simple
fix for this issue would be to cache negative answer on the URL field
too. This would make URL.equals "consistent".
So what's wrong with synchronization besides being a bottleneck? The
problem is that getHostAddress() method is using the URLStreamHandler
instance as a lock. Two URLs that are compared in the URL.equals method
are passed to the URLStreamHandler.equals(URL u1, URL u2) method of the
1st URL's handler. This handler instance need not be the same as the 2nd
URL's handler even though both URLs have same protocol. For example:
URL url1 = new URL("http://alias1/");
URL.setURLStreamHandlerFactory(...a custom factory...);
URL url2 = new URL("http://alias2/");
The "handler" instances of above two URLs are different, since the
handler of 1st URL was created with default URLStreamHandlerFactory and
the handler of 2nd URL was created with a custom
URLStreamHandlerFactory. Now suppose one thread does:
url1.equals(url2);
and some other thread does:
url2.equals(url1);
This translates to, among other things, calling the following
URLStreamHandler instance method:
protected boolean hostsEqual(URL u1, URL u2) {
InetAddress a1 = getHostAddress(u1);
InetAddress a2 = getHostAddress(u2);
// if we have internet address for both, compare them
if (a1 != null && a2 != null) {
return a1.equals(a2);
// else, if both have host names, compare them
} else if (u1.getHost() != null && u2.getHost() != null)
return u1.getHost().equalsIgnoreCase(u2.getHost());
else
return u1.getHost() == null && u2.getHost() == null;
}
So the two threads are reading and modifying URL.hostAddress field of
both URLs, but each of them is holding a separate lock. You may say that
creating URL instances, then changing the URLStreamHandlerFactory and
creating some more URL instances and than comparing them among
themselves is not happening a lot, but this could be fixed. Why not
using the URL instance as a lock when reading/writing it's field? Would
this be desirable? It would mean a lot less contention (and even less if
caching of URL.hostAddress was implemented in a lock-free way).
Because I know that URL.equals compatibility is important, I'm asking
here if a fix for this issue is desirable at all. What about
synchronization fix only (and keeping the "unstable" equals() behaviour)?
Regards, Peter
More information about the net-dev
mailing list