RFR: JDK-8051713 - URL constructor/equals/hashCode/sameFile/openConnection synchronization issues

Wed Jul 23 08:29:54 UTC 2014

I created an issue for this:

     https://bugs.openjdk.java.net/browse/JDK-8051713

The proposed patch is still the following:

http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/

Regards, Peter

On 07/11/2014 05:11 PM, Peter Levart wrote:
> Hi,
>
> java.net.URL is supposed to behave as an immutable object, so URL 
> instances can be shared among threads and among parts of code without 
> fear that they will be modified. URL class has an unusual way to 
> achieve this (or at least it tries to). Partly because of the design 
> which uses:
>
> - URL constructor(s) that take a 'spec' String to be parsed into URL 
> object
> - parsing is delegated to various URLStreamHandler(s) which are chosen 
> in the URL constructor (depending on the protocol used in URL string 
> to be parsed)
>
> An unitialized URL instance (this) is passed from constructor to the 
> chosen URLStreamHandler which has the responsibility to parse the 
> string and set back the fields that hold various parts of URL object 
> being constructed. Consequently, these fields can not be declared as 
> final, as definite assignment analysis doesn't cross method borders. 
> It is therefore illegal to unsafely publish URL instances (via data 
> races) to non-constructing threads because they can appear not fully 
> initialized. Nevertheless URL, with the help of various 
> URLStreamHandler implementations, tries hard to make URL appear stable 
> at least where it is required to be stable. For example: 
> URL.hashCode() is (almost) stable even if URL instance is unsafely 
> published. This is achieved by making hashCode() synchronized and 
> cache the result. At least one way of constructing URLs - constructors 
> that take 'spec' String to be parsed - is also making sure that 
> hashCode is computed from fully initialized fields, as parsing is 
> delegated to URLStreamHandler which uses package-private URL.set() 
> method to set back the parsed fields and the set() method is also 
> synchronized. But making URL appear stable even though it is published 
> unsafely doesn't seem to be the primary concern of URL 
> synchronization. Other public URL constructors that take individual 
> URL parts and don't delegate parsing to URLStreamHandler but set 
> fields directly (not via set() method), are not synchronized.
>
> Primary concern of synchronization in URL appears to be driven from 
> the fact that some URL operations like hasCode(), equals(), sameFile() 
> and openConnection() read multiple URL fields and URL.set() which can 
> be called from custom URLStreamHandler at any time (although this is 
> not it's purpose - it should only call-back while parsing/constructing 
> the URL) can set those fields. And those multi-field operations would 
> like to see a "snapshot" of field values that is consistent. But 
> synchronization that is performed to achieve that is questionable. 
> Might be that in Java 1.0 times the JVM implementation assumptions 
> were different and synchronization was correct, but nowadays Java 
> memory model makes them invalid.
>
> URL.hasCode() apears to be the only method properly synchronized which 
> makes it almost stable (doesn't return different results over time) 
> but even hashCode() has a subtle bug or two. The initializer for 
> hashCode field sets it to value -1 which represents "not yet computed" 
> state. If URL is published unsafely, hashCode() method could see the 
> "default" value of 0, which would be returned. A later call to 
> hashCode() would see value -1 which would trigger computation and a 
> different value would be returned. The other subtle bug is a 
> relatively improbable event that hashCode computation results in value 
> -1 which means "not yet computed". This can be seen as performance 
> glitch (as hashCode will never be cached for such URL instance) or an 
> issue which makes hashCode unstable for one of the reasons why 
> equals() is unstable too (see below).
>
> If URL.hasCode() method is almost stable (doesn't return different 
> results over time) and at least one way of URL construction makes sure 
> that hashCode is also calculated from fully initialized parts, then 
> URL.equals() and other methods that delegate to URLStreamHandler are a 
> special story. URL.equals() can't be synchronized on URL instance, 
> because it would have to be synchronized on both URL instances that 
> are being compared and this is prone do dead-locks. Imagine:
>
> thread1: url1.equals(url2)
> thread2: url2.equals(url1)
>
> So equals() chooses to not be synchronized and therefore risks not 
> being stable if URL instances are published unsafely. But it 
> nevertheless uses synchronization. equals() delegates it's work to the 
> 1st URL's URLStreamHandler which synchronizes on itself when 
> calculating and caching the InetAddress of each individual URL's host 
> name. InetAddress (if resolvable) is used in preference to host name 
> for comparison (and also in hashCode()). URL.equals() risks not being 
> stable for the following reasons:
>
> - URL instances published unsafely can appear not fully initialized to 
> equals() even though they were constructed with constructors that 
> delegate parsing to URLStreamHandler(s) which use synchronized 
> URL.set() to set the fields, because URL.equals() is not synchronized.
>
> - URL.hostAddress that is calculated on demand and then cached on the 
> URL instance should help make equals() stable in the presence of 
> dynamic changes to host name -> IP address mapping, but caching is not 
> performed for unsuccessful resolving. Temporary name service outage 
> can make URL.equals() unstable.
>
> - URL.hostAddress caching is using the URLStreamHandler instance of 
> the 1st URL as a lock for synchronizing read/write of hostAddress of 
> both URLs being compared by equals(). But URLStreamHandler(s) of the 
> two URL instances need not be the same instance even though they are 
> for the same protocol. Imagine:
>
> URL url1 = new URL(null, "http://www.google.com/", handler1);
> URL url2 = new URL(null, "http://www.google.com/", handler2);
> ...
> thread1: url1.equals(url2);
> thread2: url2.equals(url1);
>
> Each thread could be using different instance of URLStreamHandler for 
> synchronization and could overwrite each other the cached hostAddress 
> on individual URLs. These hostAddress values could be different in the 
> presence of dynamic changes to host name -> IP address mapping and 
> could therefore make URL.equals() unstable. This is admittedly a very 
> unlikely scenario, but is theoretically possible.
>
> URL.sameHost() has exactly the same issues as URL.equals() as it only 
> makes one field comparison less.
>
> URL.openConnection() is a question in itself. It is delegated to 
> URLStreamHandler. Some URLStreamHandlers make it synchronized and 
> others not. Those that make it synchronized (on the URLStreamHandler 
> instance) do this for no apparent reason. This synchronization can't 
> help make URL fields stable for the time of openConnection() call 
> since URL.set() is using a different lock (the URL instance itself). 
> It only makes things worse since access to opening the connection to 
> the resources is serialized and this presents a bottleneck.
>
> I tried to fix all these issues and came up with the following patch 
> which I'm proposing:
>
> http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/ 
>
>
> New JDK8 synchronization primitive: StampedLock is a perfect tool for 
> solving these issues as synchronization in URL is only necessary to 
> establish visibility of initialized state which doesn't change 
> afterwards or changes at most once (when computing hashCode). 
> StampedLock's tryOptimisticRead/validate is perfect for such 
> situations as it only presents a negligible overhead of two volatile 
> reads. The presented patch also contains an unrelated change which 
> replaces usage of Hashtable with ConcurrentHashMap for holding the 
> mapping from protocol to individual URLStreamhandler which makes for 
> better scallability of URL constructors. Combined with synchronization 
> scallability enhancements of InetAddress caching presented in an 
> earlier proposal, the net result is much better scalability of URL 
> constructor/equals/hashCode (executed by a JMH test on a 4-core 
> i7/Linux box):
>
> http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/URL.synchronization_bench_results.pdf 
>
>
> So with this patch java.net.URL could be treated as an 
> unsafe-publication-tolerable class (like String). Well, not entirely, 
> since getters for individual fields are still just unguarded normal 
> reads. But those reads are usually performed under guard of a 
> StampedLock when URL 
> equals/hashCode/sameFile/openConnection/toExternalForm() operations 
> are delegated to URLStreamHandler and therefore don't suffer from 
> (in)visibility of published state. Fields could be made volatile to 
> fix this if desired.
>
> I'm not entirely sure about why openConnection() in file: and mailto: 
> URLStreamHandlers is synchronized. I don't see a reason, so I removed 
> the synchronization. If there is a reason, please bring it forward.
>
> I ran the java/net jtreg tests with unpatched recent jdk9-dev and 
> patched (combined changes of URL and InetAddress caching) and get the 
> same score. Only the following 3 tests fail in both ocasions:
>
> JT Harness : Tests that failed
> java/net/MulticastSocket/Promiscuous.java: Test for interference when 
> two sockets are bound to the same port but joined to different 
> multicast groups
> java/net/MulticastSocket/SetLoopbackMode.java: Test 
> MulticastSocket.setLoopbackMode
> java/net/MulticastSocket/Test.java: IPv4 and IPv6 multicasting broken 
> on Linux
>
> They seem to not like my multicast configuration or something. All 
> other 258 tests pass.
>
> So what do you think?
>
>
> Regards, Peter
>