Difference in encoding semantics of URI returned by File.toURI and Path.toUri representing the same file

Jaikiran Pai jai.forums2013 at gmail.com
Fri Dec 18 04:50:32 UTC 2020


The java.io.File has a toURI() API which returns a (system dependent) 
URI as per its javadoc. The java.io.File also has a toPath() API which 
then exposes a toUri() API from the java.nio.file.Path. The javadoc of 
the File class doesn't specify any semantics about the toUri() returned 
by the java.nio.file.Path.

More specifically, for the same file instance, is the File.toURI() and 
Path.toUri() expected to return a URI which has the same semantics when 
it comes to encoded characters in the URI?

Consider the following trivial code for what I mean:

import java.net.*;
import java.nio.file.*;
import java.io.*;

public class PathTest {

     public static void main(final String[] args) throws Exception {
         final String location = args[0];

         final Path a = Paths.get(location).toAbsolutePath();
         System.out.println("URI from Paths.get().toUri() API " + a + " 
---> " + a.toUri());

         final Path b = new File(location).toPath().toAbsolutePath();
         System.out.println("URI from File.toPath().toUri() API " + b + 
" ---> " + b.toUri());

         final File c = new File(location).getAbsoluteFile();
         System.out.println("URI from File.toURI() API " + c + " ---> " 
+ c.toURI());

     }
}

The above program prints the URI of a file path using 3 different APIs:

1. Paths.get().toUri()
2. File.toPath().toUri()
3. File.toURI()

When I run the program and pass it a directory which contains a 
non-ascii character (which belongs to the "other" category as explained 
in the URI javadoc[1]) then I see that the URI returned by the 
Path.toUri() differs from the URI returned from the File.toURI() when it 
comes to encoding the "other" category character (i.e. the non-ascii 
character). Here's the command I use and here's the output:

mkdir foobãr
java PathTest foobãr

Output:

URI from Paths.get().toUri() API /private/tmp/delme/foobãr ---> 
file:///private/tmp/delme/fooba%CC%83r/
URI from File.toPath().toUri() API /private/tmp/delme/foobãr ---> 
file:///private/tmp/delme/fooba%CC%83r/
URI from File.toURI() API /private/tmp/delme/foobãr ---> 
file:/private/tmp/delme/foobãr/

Notice that the Path.toUri() version encodes the non-ascii characters 
whereas the File.toURI() doesn't. Is this expected? The javadoc doesn't 
have much details around this.

Now, interestingly, the same program if passed a file path which 
contains a "illegal" character (for example space character as defined 
in[1]), then both the Path.toUri() and File.toURI() return a URI which 
has the character encoded. Here's the output when you run:

mkdir "foo bar"
java PathTest "foo bar"

Output:

URI from Paths.get().toUri() API /private/tmp/delme/foo bar ---> 
file:///private/tmp/delme/foo%20bar
URI from File.toPath().toUri() API /private/tmp/delme/foo bar ---> 
file:///private/tmp/delme/foo%20bar
URI from File.toURI() API /private/tmp/delme/foo bar ---> 
file:/private/tmp/delme/foo%20bar

So it's not clear which categories of the characters will be 
(consistently) encoded by the URI returned by the Path and File 
instances, for the same target file.

[1] 
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/net/URI.html


-Jaikiran



More information about the nio-dev mailing list