Gluing together URL.equals

Thu Jul 3 16:01:42 UTC 2014

Hi,

We know that URL.equals and hashCode are fundamentally broken. But 
URL.equals is even more broken than hashCode. Nevertheless, URL.equals 
is used explicitly in the following places in JDK:

java.security.CodeSource.matchLocation
java.security.CodeSource.equals
java.util.jar.JarVerifier.VerifierCodeSource.equals
javax.sql.rowset.serial.SerialDatalink.equals
java.lang.Package.isSealed
javax.swing.JEditorPane.setPage
javax.swing.text.html.FrameView.changedUpdate
sun.applet.AppletViewer.getApplet
sun.applet.AppletViewer.getApplets

And I'm not counting places where it might be used because URLs are 
Objects (as keys in HashMaps, etc...)

I'd like to discuss one of URL.equals pitfalls that might be able to get 
fixed and whether it is desirable to fix it.

javadoc: "The equals method implements an equivalence relation on 
non-null object references:
...
It is consistent: for any non-null reference values x and y, multiple 
invocations of x.equals(y) consistently return true or consistently 
return false, provided no information used in equals comparisons on the 
objects is modified."

URL url1 = new URL("http://alias1/");
URL url2 = new URL("http://alias2/");

boolean answer1 = url1.equals(url2);
...
boolean answer2 = url1.equals(url2);

Can it happen that answer1 != answer2 ?
Yes! Suppose that alias1 and alias2 are host names that resolve to the 
same IP address. Normally, answer1 and answer2 would be "true". But only 
if the name service that resolves the host names is up and running. If 
it's not, then the answer is "false". Suppose that while obtaining 
answer1 the DNS was restarting and while obtaining answer2 it was up and 
running... Then answer1 would be "false" while answer2 would be "true". 
The following URLStreamHandler method that is called for both URLs from 
equals method is responsible for such unstable behaviour:

     protected synchronized InetAddress getHostAddress(URL u) {
         if (u.hostAddress != null)
             return u.hostAddress;

         String host = u.getHost();
         if (host == null || host.equals("")) {
             return null;
         } else {
             try {
                 u.hostAddress = InetAddress.getByName(host);
             } catch (UnknownHostException ex) {
                 return null;
             } catch (SecurityException se) {
                 return null;
             }
         }
         return u.hostAddress;
     }

As can be seen, the hostAddress is obtained by InetAddress.getByName() 
and then cached on the URL.hostAddress field. Leaving aside the fact 
that although this method is synchronized, caching of hostAddress is not 
synchronized properly (more on that later), the problem is that negative 
answer (UnknownHostException or SecurityException) is not cached. 
UnknownHostException is not cached by InetAddress.getByName() by default 
and SecurityException is dependent on the caller SecurityContext. Simple 
fix for this issue would be to cache negative answer on the URL field 
too. This would make URL.equals "consistent".

So what's wrong with synchronization besides being a bottleneck? The 
problem is that getHostAddress() method is using the URLStreamHandler 
instance as a lock. Two URLs that are compared in the URL.equals method 
are passed to the URLStreamHandler.equals(URL u1, URL u2) method of the 
1st URL's handler. This handler instance need not be the same as the 2nd 
URL's handler even though both URLs have same protocol. For example:

URL url1 = new URL("http://alias1/");
URL.setURLStreamHandlerFactory(...a custom factory...);
URL url2 = new URL("http://alias2/");

The "handler" instances of above two URLs are different, since the 
handler of 1st URL was created with default URLStreamHandlerFactory and 
the handler of 2nd URL was created with a custom 
URLStreamHandlerFactory. Now suppose one thread does:

url1.equals(url2);

and some other thread does:

url2.equals(url1);

This translates to, among other things, calling the following 
URLStreamHandler instance method:

     protected boolean hostsEqual(URL u1, URL u2) {
         InetAddress a1 = getHostAddress(u1);
         InetAddress a2 = getHostAddress(u2);
         // if we have internet address for both, compare them
         if (a1 != null && a2 != null) {
             return a1.equals(a2);
         // else, if both have host names, compare them
         } else if (u1.getHost() != null && u2.getHost() != null)
             return u1.getHost().equalsIgnoreCase(u2.getHost());
          else
             return u1.getHost() == null && u2.getHost() == null;
     }

So the two threads are reading and modifying URL.hostAddress field of 
both URLs, but each of them is holding a separate lock. You may say that 
creating URL instances, then changing the URLStreamHandlerFactory and 
creating some more URL instances and than comparing them among 
themselves is not happening a lot, but this could be fixed. Why not 
using the URL instance as a lock when reading/writing it's field? Would 
this be desirable? It would mean a lot less contention (and even less if 
caching of URL.hostAddress was implemented in a lock-free way).

Because I know that URL.equals compatibility is important, I'm asking 
here if a fix for this issue is desirable at all. What about 
synchronization fix only (and keeping the "unstable" equals() behaviour)?

Regards, Peter