RFR: JDK-8051713 - URL constructor/equals/hashCode/sameFile/openConnection synchronization issues
Peter Levart
peter.levart at gmail.com
Wed Jul 23 08:29:54 UTC 2014
I created an issue for this:
https://bugs.openjdk.java.net/browse/JDK-8051713
The proposed patch is still the following:
http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/
Regards, Peter
On 07/11/2014 05:11 PM, Peter Levart wrote:
> Hi,
>
> java.net.URL is supposed to behave as an immutable object, so URL
> instances can be shared among threads and among parts of code without
> fear that they will be modified. URL class has an unusual way to
> achieve this (or at least it tries to). Partly because of the design
> which uses:
>
> - URL constructor(s) that take a 'spec' String to be parsed into URL
> object
> - parsing is delegated to various URLStreamHandler(s) which are chosen
> in the URL constructor (depending on the protocol used in URL string
> to be parsed)
>
> An unitialized URL instance (this) is passed from constructor to the
> chosen URLStreamHandler which has the responsibility to parse the
> string and set back the fields that hold various parts of URL object
> being constructed. Consequently, these fields can not be declared as
> final, as definite assignment analysis doesn't cross method borders.
> It is therefore illegal to unsafely publish URL instances (via data
> races) to non-constructing threads because they can appear not fully
> initialized. Nevertheless URL, with the help of various
> URLStreamHandler implementations, tries hard to make URL appear stable
> at least where it is required to be stable. For example:
> URL.hashCode() is (almost) stable even if URL instance is unsafely
> published. This is achieved by making hashCode() synchronized and
> cache the result. At least one way of constructing URLs - constructors
> that take 'spec' String to be parsed - is also making sure that
> hashCode is computed from fully initialized fields, as parsing is
> delegated to URLStreamHandler which uses package-private URL.set()
> method to set back the parsed fields and the set() method is also
> synchronized. But making URL appear stable even though it is published
> unsafely doesn't seem to be the primary concern of URL
> synchronization. Other public URL constructors that take individual
> URL parts and don't delegate parsing to URLStreamHandler but set
> fields directly (not via set() method), are not synchronized.
>
> Primary concern of synchronization in URL appears to be driven from
> the fact that some URL operations like hasCode(), equals(), sameFile()
> and openConnection() read multiple URL fields and URL.set() which can
> be called from custom URLStreamHandler at any time (although this is
> not it's purpose - it should only call-back while parsing/constructing
> the URL) can set those fields. And those multi-field operations would
> like to see a "snapshot" of field values that is consistent. But
> synchronization that is performed to achieve that is questionable.
> Might be that in Java 1.0 times the JVM implementation assumptions
> were different and synchronization was correct, but nowadays Java
> memory model makes them invalid.
>
> URL.hasCode() apears to be the only method properly synchronized which
> makes it almost stable (doesn't return different results over time)
> but even hashCode() has a subtle bug or two. The initializer for
> hashCode field sets it to value -1 which represents "not yet computed"
> state. If URL is published unsafely, hashCode() method could see the
> "default" value of 0, which would be returned. A later call to
> hashCode() would see value -1 which would trigger computation and a
> different value would be returned. The other subtle bug is a
> relatively improbable event that hashCode computation results in value
> -1 which means "not yet computed". This can be seen as performance
> glitch (as hashCode will never be cached for such URL instance) or an
> issue which makes hashCode unstable for one of the reasons why
> equals() is unstable too (see below).
>
> If URL.hasCode() method is almost stable (doesn't return different
> results over time) and at least one way of URL construction makes sure
> that hashCode is also calculated from fully initialized parts, then
> URL.equals() and other methods that delegate to URLStreamHandler are a
> special story. URL.equals() can't be synchronized on URL instance,
> because it would have to be synchronized on both URL instances that
> are being compared and this is prone do dead-locks. Imagine:
>
> thread1: url1.equals(url2)
> thread2: url2.equals(url1)
>
> So equals() chooses to not be synchronized and therefore risks not
> being stable if URL instances are published unsafely. But it
> nevertheless uses synchronization. equals() delegates it's work to the
> 1st URL's URLStreamHandler which synchronizes on itself when
> calculating and caching the InetAddress of each individual URL's host
> name. InetAddress (if resolvable) is used in preference to host name
> for comparison (and also in hashCode()). URL.equals() risks not being
> stable for the following reasons:
>
> - URL instances published unsafely can appear not fully initialized to
> equals() even though they were constructed with constructors that
> delegate parsing to URLStreamHandler(s) which use synchronized
> URL.set() to set the fields, because URL.equals() is not synchronized.
>
> - URL.hostAddress that is calculated on demand and then cached on the
> URL instance should help make equals() stable in the presence of
> dynamic changes to host name -> IP address mapping, but caching is not
> performed for unsuccessful resolving. Temporary name service outage
> can make URL.equals() unstable.
>
> - URL.hostAddress caching is using the URLStreamHandler instance of
> the 1st URL as a lock for synchronizing read/write of hostAddress of
> both URLs being compared by equals(). But URLStreamHandler(s) of the
> two URL instances need not be the same instance even though they are
> for the same protocol. Imagine:
>
> URL url1 = new URL(null, "http://www.google.com/", handler1);
> URL url2 = new URL(null, "http://www.google.com/", handler2);
> ...
> thread1: url1.equals(url2);
> thread2: url2.equals(url1);
>
> Each thread could be using different instance of URLStreamHandler for
> synchronization and could overwrite each other the cached hostAddress
> on individual URLs. These hostAddress values could be different in the
> presence of dynamic changes to host name -> IP address mapping and
> could therefore make URL.equals() unstable. This is admittedly a very
> unlikely scenario, but is theoretically possible.
>
> URL.sameHost() has exactly the same issues as URL.equals() as it only
> makes one field comparison less.
>
> URL.openConnection() is a question in itself. It is delegated to
> URLStreamHandler. Some URLStreamHandlers make it synchronized and
> others not. Those that make it synchronized (on the URLStreamHandler
> instance) do this for no apparent reason. This synchronization can't
> help make URL fields stable for the time of openConnection() call
> since URL.set() is using a different lock (the URL instance itself).
> It only makes things worse since access to opening the connection to
> the resources is serialized and this presents a bottleneck.
>
> I tried to fix all these issues and came up with the following patch
> which I'm proposing:
>
> http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/
>
>
> New JDK8 synchronization primitive: StampedLock is a perfect tool for
> solving these issues as synchronization in URL is only necessary to
> establish visibility of initialized state which doesn't change
> afterwards or changes at most once (when computing hashCode).
> StampedLock's tryOptimisticRead/validate is perfect for such
> situations as it only presents a negligible overhead of two volatile
> reads. The presented patch also contains an unrelated change which
> replaces usage of Hashtable with ConcurrentHashMap for holding the
> mapping from protocol to individual URLStreamhandler which makes for
> better scallability of URL constructors. Combined with synchronization
> scallability enhancements of InetAddress caching presented in an
> earlier proposal, the net result is much better scalability of URL
> constructor/equals/hashCode (executed by a JMH test on a 4-core
> i7/Linux box):
>
> http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/URL.synchronization_bench_results.pdf
>
>
> So with this patch java.net.URL could be treated as an
> unsafe-publication-tolerable class (like String). Well, not entirely,
> since getters for individual fields are still just unguarded normal
> reads. But those reads are usually performed under guard of a
> StampedLock when URL
> equals/hashCode/sameFile/openConnection/toExternalForm() operations
> are delegated to URLStreamHandler and therefore don't suffer from
> (in)visibility of published state. Fields could be made volatile to
> fix this if desired.
>
> I'm not entirely sure about why openConnection() in file: and mailto:
> URLStreamHandlers is synchronized. I don't see a reason, so I removed
> the synchronization. If there is a reason, please bring it forward.
>
> I ran the java/net jtreg tests with unpatched recent jdk9-dev and
> patched (combined changes of URL and InetAddress caching) and get the
> same score. Only the following 3 tests fail in both ocasions:
>
> JT Harness : Tests that failed
> java/net/MulticastSocket/Promiscuous.java: Test for interference when
> two sockets are bound to the same port but joined to different
> multicast groups
> java/net/MulticastSocket/SetLoopbackMode.java: Test
> MulticastSocket.setLoopbackMode
> java/net/MulticastSocket/Test.java: IPv4 and IPv6 multicasting broken
> on Linux
>
> They seem to not like my multicast configuration or something. All
> other 258 tests pass.
>
> So what do you think?
>
>
> Regards, Peter
>
More information about the net-dev
mailing list