JDK-8019345, RFC3986, RFC2396 and java.net.URI

Peter Firmstone peter.firmstone at zeus.net.au
Sun Nov 17 02:32:25 UTC 2024


When Java 9 was released, our use of RFC3986 meant that modules with 
jrt:/ URL's were supported out of the box, as it was compliant with the 
same configuration files running on Java 8, even though we weren't 
loading these files.  Had we used URL without a jrt provider, it would 
have caused runtime failures.

Reading OpenJDK code, I see that CodeSource has been replaced with 
CodeSourceKey or similar in various places replacing URL's with a semi 
normalized string or unmodified string form.

When the class below was written, SecureClassLoader still used 
CodeSource as a key, now it uses a string, however the URL string has 
been RFC3986 normalized, so String comparison is still valid.

JGDMS/JGDMS/jgdms-platform/src/main/java/org/apache/river/api/net/RFC3986URLClassLoader.java 
at trunk · pfirmstone/JGDMS 
<https://github.com/pfirmstone/JGDMS/blob/trunk/JGDMS/jgdms-platform/src/main/java/org/apache/river/api/net/RFC3986URLClassLoader.java>

JGDMS loads classes from many places, eg local file system, local 
network, IPv6 global network, JGDMS requires end to end connectivity, so 
only supports IPv4 on local networks, however we need consistent 
normalized form for IPv6.

Our RFC3986 Uri implementation has a few methods to fix common issues 
with malformed URL's, such as those with escaped characters that 
shouldn't be escaped, or characters that aren't that need to be etc, we 
use it to clean up file paths or url's prior to opening a file or making 
a url connection.

-- 
Regards,
  
Peter

On 13/11/2024 7:13 am, Peter Firmstone wrote:
> They are incompatible.
>
> The existing URI implementation is backward compatible, but its use should be discouraged in new applications, so use diminishes over time.   It's unique to Java.
>
> RFC3986 is good for unique identity and high performance, best for computer processed data, we use it for identity, checking URL strings prior to establishing URL connections, it's also the current standard.
>
> RFC3987 IRI - good for human readability, but not performance, eg manual typing of IRI.
>
> Thinking out loud:
> Would a provider mechanism be appropriate, as the existing api is suitable for all implementations?  Serialized form is a simple string, parsed during deserialization, but how to distinguish, or does the provider order choose?
>
> Regards,
>
> Peter.
>
>
>
> Sent from my iPhone
>
>> On 12 Nov 2024, at 12:59 AM, Alan Bateman<alan.bateman at oracle.com> wrote:
>> On 10/11/2024 12:04, Peter Firmstone wrote:
>>> :
>>>
>>> Java doesn't implement RFC2396 strictly, as it has an expanded character set that doesn't require escaping and can result in more than one normalized form.   My understanding is its these types of corner cases regarding character escaping are what prevented Java's URI implementation from being upgraded to RFC3986.
>> java.net.URI (as opposed to legacy and JDK 1.0 era java.net.URL) rigorously specifies the deviations from RFC 2396, and the reasons for the deviations.
>>
>> A big part of the difference between RFC 2396 and 3986 is how the authority component is treated. With RFC 2396 it gets parsed as either a registry-based or server-based authority so very different to the newer RFC.  Relative Resolution (in the new RFC) is another significant difference, if URI were upgraded then its resolve method would produce very different results.
>>
>> -Alan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/net-dev/attachments/20241117/0b2e7ae3/attachment.htm>


More information about the net-dev mailing list