MacOS file system changes between 7u10 and 7u40?
Xueming Shen
xueming.shen at oracle.com
Sun Oct 6 10:58:49 PDT 2013
On 10/6/13 10:27 AM, Philippe Marschall wrote:
> On Fri, Oct 4, 2013 at 5:30 PM, Martin Buchholz <martinrb at google.com> wrote:
>> It is already the case that you cannot access all possible Unix file names
>> from Java because by design, file names are represented by Java strings
>> (UTF-16), but at the OS level filenames are actually arbitrary byte
>> sequences with no concept of encoding.
> While file names are Strings and java.io.File is String based
> sun.nio.fs.UnixPath is actually byte[] based. This means so you can
> access files whos name is not valid in the respective encoding given
> you can get a hold of the path. The "easy way" is through
> DirectoryStream the other through the package protected constructor.
The byte[] representation is really an internal implementation detail to
have much better
performance when the path is pushed back and forth between the Java
level and the native
level. Any String level file name access to the nio Path still involves
the charset's encoding
and decoding, which normally do not handle the nfc/nfd at all, with the
exception that
the utf based encoding that can do a code point to code point mapping.
While we had
put lots of thoughts into the encoding/decoding issue for the byte[] <=>
String conversion
the nfc/nfd issue just kicked in "recently" after the MacOS filesystem
started get attraction,
with a NFD internal representation (and an interesting flipflop-able
case sensitiveness).
Again, the idea here is to try to keep the consistency of the file name
representation at
Java level, which I personally feel more important. So maybe the
question here is do we
want to see the NFD file name at Java level, which means developer/end
user will probably
be forced to handle nfd/nfc handling themselves and the file name
hashing/equality
operation will be way more "expensive", and then the "visual"
representation... do you
want to see the file names being displayed as nfd or nfc...
-Sherman
>
> (Yes I did some checking on a Linux box with file names that are invalid UTF-8).
>
> Cheers
> Philippe
More information about the nio-dev
mailing list