Difference in encoding semantics of URI returned by File.toURI and Path.toUri representing the same file
Jaikiran Pai
jai.forums2013 at gmail.com
Fri Dec 18 04:50:32 UTC 2020
The java.io.File has a toURI() API which returns a (system dependent)
URI as per its javadoc. The java.io.File also has a toPath() API which
then exposes a toUri() API from the java.nio.file.Path. The javadoc of
the File class doesn't specify any semantics about the toUri() returned
by the java.nio.file.Path.
More specifically, for the same file instance, is the File.toURI() and
Path.toUri() expected to return a URI which has the same semantics when
it comes to encoded characters in the URI?
Consider the following trivial code for what I mean:
import java.net.*;
import java.nio.file.*;
import java.io.*;
public class PathTest {
public static void main(final String[] args) throws Exception {
final String location = args[0];
final Path a = Paths.get(location).toAbsolutePath();
System.out.println("URI from Paths.get().toUri() API " + a + "
---> " + a.toUri());
final Path b = new File(location).toPath().toAbsolutePath();
System.out.println("URI from File.toPath().toUri() API " + b +
" ---> " + b.toUri());
final File c = new File(location).getAbsoluteFile();
System.out.println("URI from File.toURI() API " + c + " ---> "
+ c.toURI());
}
}
The above program prints the URI of a file path using 3 different APIs:
1. Paths.get().toUri()
2. File.toPath().toUri()
3. File.toURI()
When I run the program and pass it a directory which contains a
non-ascii character (which belongs to the "other" category as explained
in the URI javadoc[1]) then I see that the URI returned by the
Path.toUri() differs from the URI returned from the File.toURI() when it
comes to encoding the "other" category character (i.e. the non-ascii
character). Here's the command I use and here's the output:
mkdir foobãr
java PathTest foobãr
Output:
URI from Paths.get().toUri() API /private/tmp/delme/foobãr --->
file:///private/tmp/delme/fooba%CC%83r/
URI from File.toPath().toUri() API /private/tmp/delme/foobãr --->
file:///private/tmp/delme/fooba%CC%83r/
URI from File.toURI() API /private/tmp/delme/foobãr --->
file:/private/tmp/delme/foobãr/
Notice that the Path.toUri() version encodes the non-ascii characters
whereas the File.toURI() doesn't. Is this expected? The javadoc doesn't
have much details around this.
Now, interestingly, the same program if passed a file path which
contains a "illegal" character (for example space character as defined
in[1]), then both the Path.toUri() and File.toURI() return a URI which
has the character encoded. Here's the output when you run:
mkdir "foo bar"
java PathTest "foo bar"
Output:
URI from Paths.get().toUri() API /private/tmp/delme/foo bar --->
file:///private/tmp/delme/foo%20bar
URI from File.toPath().toUri() API /private/tmp/delme/foo bar --->
file:///private/tmp/delme/foo%20bar
URI from File.toURI() API /private/tmp/delme/foo bar --->
file:/private/tmp/delme/foo%20bar
So it's not clear which categories of the characters will be
(consistently) encoded by the URI returned by the Path and File
instances, for the same target file.
[1]
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/net/URI.html
-Jaikiran
More information about the nio-dev
mailing list