JDK-8019345, RFC3986, RFC2396 and java.net.URI
Peter Firmstone
peter.firmstone at zeus.net.au
Sun Nov 10 12:04:47 UTC 2024
We've been using an RFC3986 URI implementation for over a decade, there
were issues we had to work around regarding formatting, so we provided
static methods to address them. Significant performance benefits can be
derived from strict normalization relating to identity.
Java doesn't implement RFC2396 strictly, as it has an expanded character
set that doesn't require escaping and can result in more than one
normalized form. My understanding is its these types of corner cases
regarding character escaping are what prevented Java's URI
implementation from being upgraded to RFC3986.
We use RFC3986 for identity and we use URL for connections, currently
the JDK still depends on URL for identity, which generally incurs the
cost of network DNS lookup (at least once per URL). When using Uri for
identity, it allows for server replication for example, however when URL
is used, as it resolves to an IP address, there may be a number of
replicating servers that resolve from the same URI to an address
range. In reality identity should be determined by authentication,
there's a high cost of using DNS to determine identity, when a suitable
RFC3986 URI normalization implementation can infer it without incurring
network calls.
Perhaps it might be an option to use a provider mechanism, allowing an
RFC version to be selected?
Our implementation also has a utility method to return a URL instance
for connections.
Javadoc from our RFC3986 Uri implementation (AL2.0):
/**
* This class represents an immutable instance of a URI as defined by
RFC 3986.
* <p>
* This class replaces java.net.URI functionality.
* <p>
* Unlike java.net.URI this class is not Serializable and hashCode and
* equality is governed by strict RFC3986 normalisation. In addition
"other"
* characters allowed in java.net.URI as specified by javadoc, not
specifically
* allowed by RFC3986 are illegal and must be escaped. This strict
adherence
* is essential to eliminate false negative or positive matches.
* <p>
* In addition to RFC3896 normalisation, on OS platforms with a \ file
separator
* the path is converted to UPPER CASE for comparison for file: schema,
during
* equals and hashCode calls.
* <p>
* IPv6 and IPvFuture host addresses must be enclosed in square
brackets as per
* RFC3986. A zone delimiter %, if present, must be represented in
escaped %25
* form as per RFC6874.
* <p>
* In addition to RFC3986 normalization, IPv6 host addresses will be
normalized
* to comply with RFC 5952 A Recommendation for IPv6 Address Text
Representation.
* This is to ensure consistent equality between identical IPv6 addresses.
*
* @since 3.0.0
*/
public final class Uri implements Comparable<Uri> {
<SNIP> Static factory methods for various cases:
/**
* Parses the given argument {@code rfc3986compliantURI} and
creates an appropriate URI
* instance.
*
* The parameter string is checked for compliance, an
IllegalArgumentException
* is thrown if the string is non compliant.
*
* @param rfc3986compliantURI
* the string which has to be parsed to create the URI
instance.
* @return the created instance representing the given URI.
*/
public static Uri create(String rfc3986compliantURI) {
Uri result = null;
try {
result = new Uri(rfc3986compliantURI);
} catch (URISyntaxException e) {
throw new IllegalArgumentException(e.getMessage());
}
return result;
}
/**
* The parameter string doesn't contain any existing escape
sequences, any
* escape character % found is encoded as %25. Illegal characters are
* escaped if possible.
*
* The Uri is normalised according to RFC3986.
*
* @param unescapedString URI in un-escaped string form
* @return an RFC3986 compliant Uri.
* @throws java.net.URISyntaxException if string cannot be escaped.
*/
public static Uri escapeAndCreate(String unescapedString) throws
URISyntaxException{
return new Uri(quoteComponent(unescapedString, allLegalUnescaped));
}
/**
* The parameter string may already contain escaped sequences, any
illegal
* characters are escaped and any that shouldn't be escaped are
un-escaped.
*
* The escape character % is not re-encoded.
* @param nonCompliantEscapedString URI in string from.
* @return an RFC3986 compliant Uri.
* @throws java.net.URISyntaxException if string cannot be escaped.
*/
public static Uri parseAndCreate(String nonCompliantEscapedString)
throws URISyntaxException{
return new Uri(quoteComponent(nonCompliantEscapedString,
allLegal));
}
<SNIP>
/** Fixes windows file URI string by converting back slashes to forward
* slashes and inserting a forward slash before the drive letter if
it is
* missing. No normalisation or modification of case is performed.
* @param uri String representation of URI
* @return fixed URI String
*/
public static String fixWindowsURI(String uri) {
if (uri == null) return null;
if (File.separatorChar != '\\') return uri;
if ( uri.startsWith("file:") || uri.startsWith("FILE:")){
char [] u = uri.toCharArray();
int l = u.length;
StringBuilder sb = new StringBuilder(uri.length()+1);
for (int i=0; i<l; i++){
// Ensure we use forward slashes
if (u[i] == File.separatorChar) {
sb.append('/');
continue;
}
if (i == 5 && uri.startsWith(":", 6 )) {
// Windows drive letter without leading slashes
doesn't comply
// with URI spec, fix it here
sb.append("/");
}
sb.append(u[i]);
}
return sb.toString();
}
return uri;
}
<SNIP>
public static Uri filePathToUri(String path) throws URISyntaxException{
String forwardSlash = "/";
if (path == null || path.length() == 0) {
// codebase is "file:"
path = "*";
}
// Ensure compatibility with URLClassLoader, when directory
// character is dropped by File.
boolean directory = false;
if (path.endsWith(forwardSlash)) directory = true;
path = new File(path).getAbsolutePath();
if (directory) {
if (!(path.endsWith(File.separator))){
path = path + File.separator;
}
}
if (File.separatorChar == '\\') {
path = path.replace(File.separatorChar, '/');
}
path = fixWindowsURI("file:" + path);
return Uri.escapeAndCreate(path); //$NON-NLS-1$
}
<SNIP>
/**
* Converts this URI instance to a URL.
*
* @return the created URL representing the same resource as this URI.
* @throws MalformedURLException
* if an error occurs while creating the URL or no protocol
* handler could be found.
*/
public URL toURL() throws MalformedURLException {
if (!absolute) {
throw new
IllegalArgumentException(Messages.getString("luni.91") + ": "
//$NON-NLS-1$//$NON-NLS-2$
+ toString());
}
if (opaque) return new URL(toString()); // Let the Handler
parse it.
String hst = host;
StringBuilder sb = new StringBuilder();
//userinfo will be rare, utilise sb, then clear it.
if (userinfo != null){
sb.append(userinfo).append('@').append(hst);
hst = sb.toString();
sb.delete(0, sb.length()-1);
}
// now lets create the file section of the URL.
sb.append(path);
if (query != null) sb.append('?').append(query);
if (fragment != null) sb.append('#').append(fragment);
String file = sb.toString(); //for code readability
// deprecated to provide a warning against misuse, not for removal.
@SuppressWarnings("deprecation")
URL url = new URL(scheme, hst, port, file, null);
return url;
}
--
Regards,
Peter
More information about the net-dev
mailing list