RFR: JDK-8051713 - URL constructor/equals/hashCode/sameFile/openConnection synchronization issues

Fri Jul 25 12:53:13 UTC 2014

Hi Peter,

This is certainly a thorny issue, and I agree with the approach of using 
StampedLock.

Some small comments / questions:

1) Why abuse fieldsLock in hostAddress(), rather than grabbing the
    writeLock ?

2) Does the setAccessible in readObject need to be in a doPriv?
    Also should this throw an InternalError/AssertionError if it fails?

I'll pop you patch into our internal build and test system.

-Chris.

On 23/07/14 09:29, Peter Levart wrote:
> I created an issue for this:
>
>      https://bugs.openjdk.java.net/browse/JDK-8051713
>
> The proposed patch is still the following:
>
> http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/
>
> Regards, Peter
>
>
> On 07/11/2014 05:11 PM, Peter Levart wrote:
>> Hi,
>>
>> java.net.URL is supposed to behave as an immutable object, so URL
>> instances can be shared among threads and among parts of code without
>> fear that they will be modified. URL class has an unusual way to
>> achieve this (or at least it tries to). Partly because of the design
>> which uses:
>>
>> - URL constructor(s) that take a 'spec' String to be parsed into URL
>> object
>> - parsing is delegated to various URLStreamHandler(s) which are chosen
>> in the URL constructor (depending on the protocol used in URL string
>> to be parsed)
>>
>> An unitialized URL instance (this) is passed from constructor to the
>> chosen URLStreamHandler which has the responsibility to parse the
>> string and set back the fields that hold various parts of URL object
>> being constructed. Consequently, these fields can not be declared as
>> final, as definite assignment analysis doesn't cross method borders.
>> It is therefore illegal to unsafely publish URL instances (via data
>> races) to non-constructing threads because they can appear not fully
>> initialized. Nevertheless URL, with the help of various
>> URLStreamHandler implementations, tries hard to make URL appear stable
>> at least where it is required to be stable. For example:
>> URL.hashCode() is (almost) stable even if URL instance is unsafely
>> published. This is achieved by making hashCode() synchronized and
>> cache the result. At least one way of constructing URLs - constructors
>> that take 'spec' String to be parsed - is also making sure that
>> hashCode is computed from fully initialized fields, as parsing is
>> delegated to URLStreamHandler which uses package-private URL.set()
>> method to set back the parsed fields and the set() method is also
>> synchronized. But making URL appear stable even though it is published
>> unsafely doesn't seem to be the primary concern of URL
>> synchronization. Other public URL constructors that take individual
>> URL parts and don't delegate parsing to URLStreamHandler but set
>> fields directly (not via set() method), are not synchronized.
>>
>> Primary concern of synchronization in URL appears to be driven from
>> the fact that some URL operations like hasCode(), equals(), sameFile()
>> and openConnection() read multiple URL fields and URL.set() which can
>> be called from custom URLStreamHandler at any time (although this is
>> not it's purpose - it should only call-back while parsing/constructing
>> the URL) can set those fields. And those multi-field operations would
>> like to see a "snapshot" of field values that is consistent. But
>> synchronization that is performed to achieve that is questionable.
>> Might be that in Java 1.0 times the JVM implementation assumptions
>> were different and synchronization was correct, but nowadays Java
>> memory model makes them invalid.
>>
>> URL.hasCode() apears to be the only method properly synchronized which
>> makes it almost stable (doesn't return different results over time)
>> but even hashCode() has a subtle bug or two. The initializer for
>> hashCode field sets it to value -1 which represents "not yet computed"
>> state. If URL is published unsafely, hashCode() method could see the
>> "default" value of 0, which would be returned. A later call to
>> hashCode() would see value -1 which would trigger computation and a
>> different value would be returned. The other subtle bug is a
>> relatively improbable event that hashCode computation results in value
>> -1 which means "not yet computed". This can be seen as performance
>> glitch (as hashCode will never be cached for such URL instance) or an
>> issue which makes hashCode unstable for one of the reasons why
>> equals() is unstable too (see below).
>>
>> If URL.hasCode() method is almost stable (doesn't return different
>> results over time) and at least one way of URL construction makes sure
>> that hashCode is also calculated from fully initialized parts, then
>> URL.equals() and other methods that delegate to URLStreamHandler are a
>> special story. URL.equals() can't be synchronized on URL instance,
>> because it would have to be synchronized on both URL instances that
>> are being compared and this is prone do dead-locks. Imagine:
>>
>> thread1: url1.equals(url2)
>> thread2: url2.equals(url1)
>>
>> So equals() chooses to not be synchronized and therefore risks not
>> being stable if URL instances are published unsafely. But it
>> nevertheless uses synchronization. equals() delegates it's work to the
>> 1st URL's URLStreamHandler which synchronizes on itself when
>> calculating and caching the InetAddress of each individual URL's host
>> name. InetAddress (if resolvable) is used in preference to host name
>> for comparison (and also in hashCode()). URL.equals() risks not being
>> stable for the following reasons:
>>
>> - URL instances published unsafely can appear not fully initialized to
>> equals() even though they were constructed with constructors that
>> delegate parsing to URLStreamHandler(s) which use synchronized
>> URL.set() to set the fields, because URL.equals() is not synchronized.
>>
>> - URL.hostAddress that is calculated on demand and then cached on the
>> URL instance should help make equals() stable in the presence of
>> dynamic changes to host name -> IP address mapping, but caching is not
>> performed for unsuccessful resolving. Temporary name service outage
>> can make URL.equals() unstable.
>>
>> - URL.hostAddress caching is using the URLStreamHandler instance of
>> the 1st URL as a lock for synchronizing read/write of hostAddress of
>> both URLs being compared by equals(). But URLStreamHandler(s) of the
>> two URL instances need not be the same instance even though they are
>> for the same protocol. Imagine:
>>
>> URL url1 = new URL(null, "http://www.google.com/", handler1);
>> URL url2 = new URL(null, "http://www.google.com/", handler2);
>> ...
>> thread1: url1.equals(url2);
>> thread2: url2.equals(url1);
>>
>> Each thread could be using different instance of URLStreamHandler for
>> synchronization and could overwrite each other the cached hostAddress
>> on individual URLs. These hostAddress values could be different in the
>> presence of dynamic changes to host name -> IP address mapping and
>> could therefore make URL.equals() unstable. This is admittedly a very
>> unlikely scenario, but is theoretically possible.
>>
>> URL.sameHost() has exactly the same issues as URL.equals() as it only
>> makes one field comparison less.
>>
>> URL.openConnection() is a question in itself. It is delegated to
>> URLStreamHandler. Some URLStreamHandlers make it synchronized and
>> others not. Those that make it synchronized (on the URLStreamHandler
>> instance) do this for no apparent reason. This synchronization can't
>> help make URL fields stable for the time of openConnection() call
>> since URL.set() is using a different lock (the URL instance itself).
>> It only makes things worse since access to opening the connection to
>> the resources is serialized and this presents a bottleneck.
>>
>> I tried to fix all these issues and came up with the following patch
>> which I'm proposing:
>>
>> http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/
>>
>>
>> New JDK8 synchronization primitive: StampedLock is a perfect tool for
>> solving these issues as synchronization in URL is only necessary to
>> establish visibility of initialized state which doesn't change
>> afterwards or changes at most once (when computing hashCode).
>> StampedLock's tryOptimisticRead/validate is perfect for such
>> situations as it only presents a negligible overhead of two volatile
>> reads. The presented patch also contains an unrelated change which
>> replaces usage of Hashtable with ConcurrentHashMap for holding the
>> mapping from protocol to individual URLStreamhandler which makes for
>> better scallability of URL constructors. Combined with synchronization
>> scallability enhancements of InetAddress caching presented in an
>> earlier proposal, the net result is much better scalability of URL
>> constructor/equals/hashCode (executed by a JMH test on a 4-core
>> i7/Linux box):
>>
>> http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/URL.synchronization_bench_results.pdf
>>
>>
>> So with this patch java.net.URL could be treated as an
>> unsafe-publication-tolerable class (like String). Well, not entirely,
>> since getters for individual fields are still just unguarded normal
>> reads. But those reads are usually performed under guard of a
>> StampedLock when URL
>> equals/hashCode/sameFile/openConnection/toExternalForm() operations
>> are delegated to URLStreamHandler and therefore don't suffer from
>> (in)visibility of published state. Fields could be made volatile to
>> fix this if desired.
>>
>> I'm not entirely sure about why openConnection() in file: and mailto:
>> URLStreamHandlers is synchronized. I don't see a reason, so I removed
>> the synchronization. If there is a reason, please bring it forward.
>>
>> I ran the java/net jtreg tests with unpatched recent jdk9-dev and
>> patched (combined changes of URL and InetAddress caching) and get the
>> same score. Only the following 3 tests fail in both ocasions:
>>
>> JT Harness : Tests that failed
>> java/net/MulticastSocket/Promiscuous.java: Test for interference when
>> two sockets are bound to the same port but joined to different
>> multicast groups
>> java/net/MulticastSocket/SetLoopbackMode.java: Test
>> MulticastSocket.setLoopbackMode
>> java/net/MulticastSocket/Test.java: IPv4 and IPv6 multicasting broken
>> on Linux
>>
>> They seem to not like my multicast configuration or something. All
>> other 258 tests pass.
>>
>> So what do you think?
>>
>>
>> Regards, Peter
>>
>