MacOS file system changes between 7u10 and 7u40?
Philippe Marschall
philippe.marschall at gmail.com
Sun Oct 13 04:24:07 PDT 2013
On Sun, Oct 6, 2013 at 7:58 PM, Xueming Shen <xueming.shen at oracle.com> wrote:
> On 10/6/13 10:27 AM, Philippe Marschall wrote:
>>
>> On Fri, Oct 4, 2013 at 5:30 PM, Martin Buchholz <martinrb at google.com>
>> wrote:
>>>
>>> It is already the case that you cannot access all possible Unix file
>>> names
>>> from Java because by design, file names are represented by Java strings
>>> (UTF-16), but at the OS level filenames are actually arbitrary byte
>>> sequences with no concept of encoding.
>>
>> While file names are Strings and java.io.File is String based
>> sun.nio.fs.UnixPath is actually byte[] based. This means so you can
>> access files whos name is not valid in the respective encoding given
>> you can get a hold of the path. The "easy way" is through
>> DirectoryStream the other through the package protected constructor.
>
>
> The byte[] representation is really an internal implementation detail to
> have much better
> performance when the path is pushed back and forth between the Java level
> and the native
> level. Any String level file name access to the nio Path still involves the
> charset's encoding
> and decoding, which normally do not handle the nfc/nfd at all, with the
> exception that
> the utf based encoding that can do a code point to code point mapping.
> While we had
> put lots of thoughts into the encoding/decoding issue for the byte[] <=>
> String conversion
> the nfc/nfd issue just kicked in "recently" after the MacOS filesystem
> started get attraction,
> with a NFD internal representation (and an interesting flipflop-able case
> sensitiveness).
> Again, the idea here is to try to keep the consistency of the file name
> representation at
> Java level, which I personally feel more important.
As I said before I don't see how you can do that when Linux and
Windows allow NFC and NFD side by side. Additionally if that is really
a goal then I think you should commit to that and put it into the
contract. As it is right now it's just an implementation artifact that
could change at any point, it is therefore useless because you can't
rely on it. An other JVM could do something completely different.
> So maybe the question
> here is do we
> want to see the NFD file name at Java level, which means developer/end user
> will probably
> be forced to handle nfd/nfc handling themselves and the file name
> hashing/equality
Well, there's Files.isSameFile and Path.toRealPath for this. I would
not expect anything else to work especially because the Path.equals
contract is so lose (which is fine).
> operation will be way more "expensive", and then the "visual"
> representation... do you
> want to see the file names being displayed as nfd or nfc...
For rendering or displaying I don't really care, NFC and NFD should
render and display the same (at least that's my understanding). As for
the Java String-level I don't really know. On one hand we can't and
shouldn't really make any assumptions on how the underlying Java file
system works as expressed in various Javadocs. On the other hand it
would be intuitive (at least to me) if it behaved like the underlying
native file system so you know what to expect. As I said before I
don't think you can make it consistent across all platforms.
Cheers
Philippe
More information about the nio-dev
mailing list