Codereview request for 7130915: File.equals does not give expected results when path contains Non-English characters on Mac OS X
Hi Here is the proposed change to support Unicode nfd/nfc and case insensitive file path on MacOSX file system. 7130915: File.equals does not give expected results when path contains Non-English characters on Mac OS X 7168427: FileInputStream cannot open file where the file path contains asian characters [macosx] While these two bug reports are only against java.io, we have the same issue in javax.nio.file. Here is the webrev http://cr.openjdk.java.net/~sherman/7130915_7168427/webrev/ Here is the brief summary of the changes java.io.File: (1) removed nfc->nfd conversion in io_util.h/WITH_PLATFORM_STRING, which means we are now passing nfc/composite characters directly into macosx file system APIs without normalize them to nfd. It appears macosx fs APIs do take nfc, though it uses nfd. (2) normalize the resulting file name from macosx fs APIs from nfd->nfd before passing back to java.io.File (File.list() and canonicalize()), so we deal with nfdc file name (as "usual") for java.io classes/APIs. (3) fs.compare()/hashCode() was updated to be case insensitive (4) hasCode() was updated to use the new String.hash32(). java.nio.file: (5) added a setof MacOSXFile... on top of existing BsdFile... classes. An alternative is to update those BsdFile... classes directly to address the macosx specific issues. But given there might be developers over there might work on open jdk BSD port and have dependency on these classes, it might be desirable to have another separate layer of MacOSXFile... classes. So now the default FileSystem/Provider is MacOSXFileSystemProvider and MacOSXFileSystem. (6) the "main" changes are in MacOSXFileSystem, in which the corresponding methods were added to handle, case insensitive and nfd<=>nfc normalization, including the pathmatcher. (7) compare is now are case-insensitive (8) hashCode is now murmur3_32(), this is true for all Solaris/Unix/Linux and maxosx. Though lots of files have been touched, but the line of changes are actually relatively small. The proposed change only deals with the default case-sensitiveness seting, which is case insensitive. On MaxOSX, you actually can configure the HFS+ file system or the mounted vol to be case-sensitive. A possible approach is to have some extra FileStore attributes, such as a isCaseSensitive and to use case sensitive compare/equal on such fs, but this can be dealt with separately later. Thanks, -Sherman
On Jun 22 2012, at 10:01 , Xueming Shen wrote:
Hi
Here is the proposed change to support Unicode nfd/nfc and case insensitive file path on MacOSX file system.
7130915: File.equals does not give expected results when path contains Non-English characters on Mac OS X 7168427: FileInputStream cannot open file where the file path contains asian characters [macosx]
While these two bug reports are only against java.io, we have the same issue in javax.nio.file. Here is the webrev
(3) fs.compare()/hashCode() was updated to be case insensitive
Won't this cause problems on case sensitive file systems? The MacOSX filesystem is by default case insensitive but case sensitive file systems are not entirely uncommon.
(4) hasCode() was updated to use the new String.hash32().
It's possible that this interface may not make it into Java 8. Doug Lea has an alternate proposal for hash based maps that would make this interface unnecessary.
(7) compare is now are case-insensitive
Repeated concern about implications for case sensitive file systems. Mike
On 6/22/12 11:02 AM, Mike Duigou wrote:
On Jun 22 2012, at 10:01 , Xueming Shen wrote:
Hi
Here is the proposed change to support Unicode nfd/nfc and case insensitive file path on MacOSX file system.
7130915: File.equals does not give expected results when path contains Non-English characters on Mac OS X 7168427: FileInputStream cannot open file where the file path contains asian characters [macosx]
While these two bug reports are only against java.io, we have the same issue in javax.nio.file. Here is the webrev
(3) fs.compare()/hashCode() was updated to be case insensitive Won't this cause problems on case sensitive file systems? The MacOSX filesystem is by default case insensitive but case sensitive file systems are not entirely uncommon.
Yes, it might/will cause problem on case sensitive hfs+ file system, but this use scenario is not this patch tries to address. On MacOSX you can format one of your disks to be case sensitive (create a new disk image and set the format to be case sensitive, via the Disk Utility, for example) or you might be able to configure your whole HFS+ file system to be case sensitive, which means the case sensitiveness is actually one of the attributes of the volume (FileStore, in JSR203's term), not the whole file system. But the file system has its default setting regarding the path case sensitiveness. On HFS+ it's case insensitive. This is actually not a unique problem of MacOS file system, you can mount a Windows FAT32 drive on LInux or vise versa, it's a difficult issue. The JSR-203's solution is to use the Path + FileSystem to "modle and be consistent with the platform's virtual file system, not the specific underlying file system", so this means on Unix/Linux, the path matching is case sensitive, on Windows it's case insensitive and on MacOSX, we go with the default case_insensitive. That said, an alternative is to set the default case sensitiveness behavior bases on the setting of the volume that the default file system is mounted on, if the root is on a volume that has case sensitive, then the MaxOSXFileSystem is case sensitive. The code to detect the volume's case sensitive setting is currently committed out. Alan and I chatted about this, we agreed that this is out of the scope of this patch, we can leave that for a future enhancement. -Sherman
On 22/06/2012 19:02, Mike Duigou wrote:
:
Won't this cause problems on case sensitive file systems? The MacOSX filesystem is by default case insensitive but case sensitive file systems are not entirely uncommon.
It shouldn't cause any issues accessing files, this is really just about equals, sorting, and path matching. I think it requires a bit of thought as to whether to change this because Apple's JDK6 and older releases does not appear to have changed File#equals. In any case, just to put some context on Sherman's changes, this really just another installation of the port to Mac as this area was not completely ported. In that context then the changes to fix the normalization issues are very welcome. Other missing pieces in this area included the watch service, and also a FileTypeDetector implementation. -Alan.
Hi Sherman, There are several places where Locale.ENGLISH is used for locale neutral processing. You could instead use Locale.ROOT for that purpose. Naoto On 12/06/22 10:01, Xueming Shen wrote:
Hi
Here is the proposed change to support Unicode nfd/nfc and case insensitive file path on MacOSX file system.
7130915: File.equals does not give expected results when path contains Non-English characters on Mac OS X 7168427: FileInputStream cannot open file where the file path contains asian characters [macosx]
While these two bug reports are only against java.io, we have the same issue in javax.nio.file. Here is the webrev
http://cr.openjdk.java.net/~sherman/7130915_7168427/webrev/
Here is the brief summary of the changes
java.io.File: (1) removed nfc->nfd conversion in io_util.h/WITH_PLATFORM_STRING, which means we are now passing nfc/composite characters directly into macosx file system APIs without normalize them to nfd. It appears macosx fs APIs do take nfc, though it uses nfd.
(2) normalize the resulting file name from macosx fs APIs from nfd->nfd before passing back to java.io.File (File.list() and canonicalize()), so we deal with nfdc file name (as "usual") for java.io classes/APIs.
(3) fs.compare()/hashCode() was updated to be case insensitive
(4) hasCode() was updated to use the new String.hash32().
java.nio.file:
(5) added a setof MacOSXFile... on top of existing BsdFile... classes. An alternative is to update those BsdFile... classes directly to address the macosx specific issues. But given there might be developers over there might work on open jdk BSD port and have dependency on these classes, it might be desirable to have another separate layer of MacOSXFile... classes. So now the default FileSystem/Provider is MacOSXFileSystemProvider and MacOSXFileSystem.
(6) the "main" changes are in MacOSXFileSystem, in which the corresponding methods were added to handle, case insensitive and nfd<=>nfc normalization, including the pathmatcher.
(7) compare is now are case-insensitive
(8) hashCode is now murmur3_32(), this is true for all Solaris/Unix/Linux and maxosx.
Though lots of files have been touched, but the line of changes are actually relatively small.
The proposed change only deals with the default case-sensitiveness seting, which is case insensitive. On MaxOSX, you actually can configure the HFS+ file system or the mounted vol to be case-sensitive. A possible approach is to have some extra FileStore attributes, such as a isCaseSensitive and to use case sensitive compare/equal on such fs, but this can be dealt with separately later.
Thanks, -Sherman
Will this address issue MACOSX_PORT-165 [1]? [1] http://java.net/jira/browse/MACOSX_PORT-165 -- David On 22.06.2012, at 19:01, Xueming Shen wrote:
Hi
Here is the proposed change to support Unicode nfd/nfc and case insensitive file path on MacOSX file system.
7130915: File.equals does not give expected results when path contains Non-English characters on Mac OS X 7168427: FileInputStream cannot open file where the file path contains asian characters [macosx]
While these two bug reports are only against java.io, we have the same issue in javax.nio.file. Here is the webrev
http://cr.openjdk.java.net/~sherman/7130915_7168427/webrev/
Here is the brief summary of the changes
java.io.File: (1) removed nfc->nfd conversion in io_util.h/WITH_PLATFORM_STRING, which means we are now passing nfc/composite characters directly into macosx file system APIs without normalize them to nfd. It appears macosx fs APIs do take nfc, though it uses nfd.
(2) normalize the resulting file name from macosx fs APIs from nfd->nfd before passing back to java.io.File (File.list() and canonicalize()), so we deal with nfdc file name (as "usual") for java.io classes/APIs.
(3) fs.compare()/hashCode() was updated to be case insensitive
(4) hasCode() was updated to use the new String.hash32().
java.nio.file:
(5) added a setof MacOSXFile... on top of existing BsdFile... classes. An alternative is to update those BsdFile... classes directly to address the macosx specific issues. But given there might be developers over there might work on open jdk BSD port and have dependency on these classes, it might be desirable to have another separate layer of MacOSXFile... classes. So now the default FileSystem/Provider is MacOSXFileSystemProvider and MacOSXFileSystem.
(6) the "main" changes are in MacOSXFileSystem, in which the corresponding methods were added to handle, case insensitive and nfd<=>nfc normalization, including the pathmatcher.
(7) compare is now are case-insensitive
(8) hashCode is now murmur3_32(), this is true for all Solaris/Unix/Linux and maxosx.
Though lots of files have been touched, but the line of changes are actually relatively small.
The proposed change only deals with the default case-sensitiveness seting, which is case insensitive. On MaxOSX, you actually can configure the HFS+ file system or the mounted vol to be case-sensitive. A possible approach is to have some extra FileStore attributes, such as a isCaseSensitive and to use case sensitive compare/equal on such fs, but this can be dealt with separately later.
Thanks, -Sherman
Yes, I believe the issue described in MACOSX_PORT-165 is the same issue this patch is trying to solve. Btw, it appears there are typos in the note(2), my mini keyboard obviously is too sticky:-) (2) normalize the resulting file name from macosx fs APIs from nfd->nfc before passing back to java.io.File (File.list() and canonicalize()), so we deal with nfc file name (as "usual") for java.io classes/APIs. -sherman On 6/24/12 7:50 AM, David Kocher wrote:
Will this address issue MACOSX_PORT-165 [1]?
[1] http://java.net/jira/browse/MACOSX_PORT-165
-- David
On 22.06.2012, at 19:01, Xueming Shen wrote:
Hi
Here is the proposed change to support Unicode nfd/nfc and case insensitive file path on MacOSX file system.
7130915: File.equals does not give expected results when path contains Non-English characters on Mac OS X 7168427: FileInputStream cannot open file where the file path contains asian characters [macosx]
While these two bug reports are only against java.io, we have the same issue in javax.nio.file. Here is the webrev
http://cr.openjdk.java.net/~sherman/7130915_7168427/webrev/
Here is the brief summary of the changes
java.io.File: (1) removed nfc->nfd conversion in io_util.h/WITH_PLATFORM_STRING, which means we are now passing nfc/composite characters directly into macosx file system APIs without normalize them to nfd. It appears macosx fs APIs do take nfc, though it uses nfd.
(2) normalize the resulting file name from macosx fs APIs from nfd->nfd before passing back to java.io.File (File.list() and canonicalize()), so we deal with nfdc file name (as "usual") for java.io classes/APIs.
(3) fs.compare()/hashCode() was updated to be case insensitive
(4) hasCode() was updated to use the new String.hash32().
java.nio.file:
(5) added a setof MacOSXFile... on top of existing BsdFile... classes. An alternative is to update those BsdFile... classes directly to address the macosx specific issues. But given there might be developers over there might work on open jdk BSD port and have dependency on these classes, it might be desirable to have another separate layer of MacOSXFile... classes. So now the default FileSystem/Provider is MacOSXFileSystemProvider and MacOSXFileSystem.
(6) the "main" changes are in MacOSXFileSystem, in which the corresponding methods were added to handle, case insensitive and nfd<=>nfc normalization, including the pathmatcher.
(7) compare is now are case-insensitive
(8) hashCode is now murmur3_32(), this is true for all Solaris/Unix/Linux and maxosx.
Though lots of files have been touched, but the line of changes are actually relatively small.
The proposed change only deals with the default case-sensitiveness seting, which is case insensitive. On MaxOSX, you actually can configure the HFS+ file system or the mounted vol to be case-sensitive. A possible approach is to have some extra FileStore attributes, such as a isCaseSensitive and to use case sensitive compare/equal on such fs, but this can be dealt with separately later.
Thanks, -Sherman
I welcome this issue is getting some serious attention then. When will this be backported to 7u? -- David On 24.06.2012, at 18:58, Xueming Shen wrote:
Yes, I believe the issue described in MACOSX_PORT-165 is the same issue this patch is trying to solve.
Btw, it appears there are typos in the note(2), my mini keyboard obviously is too sticky:-)
(2) normalize the resulting file name from macosx fs APIs from nfd->nfc before passing back to java.io.File (File.list() and canonicalize()), so we deal with nfc file name (as "usual") for java.io classes/APIs.
-sherman
On 6/24/12 7:50 AM, David Kocher wrote:
Will this address issue MACOSX_PORT-165 [1]?
[1] http://java.net/jira/browse/MACOSX_PORT-165
-- David
On 22.06.2012, at 19:01, Xueming Shen wrote:
Hi
Here is the proposed change to support Unicode nfd/nfc and case insensitive file path on MacOSX file system.
7130915: File.equals does not give expected results when path contains Non-English characters on Mac OS X 7168427: FileInputStream cannot open file where the file path contains asian characters [macosx]
While these two bug reports are only against java.io, we have the same issue in javax.nio.file. Here is the webrev
http://cr.openjdk.java.net/~sherman/7130915_7168427/webrev/
Here is the brief summary of the changes
java.io.File: (1) removed nfc->nfd conversion in io_util.h/WITH_PLATFORM_STRING, which means we are now passing nfc/composite characters directly into macosx file system APIs without normalize them to nfd. It appears macosx fs APIs do take nfc, though it uses nfd.
(2) normalize the resulting file name from macosx fs APIs from nfd->nfd before passing back to java.io.File (File.list() and canonicalize()), so we deal with nfdc file name (as "usual") for java.io classes/APIs.
(3) fs.compare()/hashCode() was updated to be case insensitive
(4) hasCode() was updated to use the new String.hash32().
java.nio.file:
(5) added a setof MacOSXFile... on top of existing BsdFile... classes. An alternative is to update those BsdFile... classes directly to address the macosx specific issues. But given there might be developers over there might work on open jdk BSD port and have dependency on these classes, it might be desirable to have another separate layer of MacOSXFile... classes. So now the default FileSystem/Provider is MacOSXFileSystemProvider and MacOSXFileSystem.
(6) the "main" changes are in MacOSXFileSystem, in which the corresponding methods were added to handle, case insensitive and nfd<=>nfc normalization, including the pathmatcher.
(7) compare is now are case-insensitive
(8) hashCode is now murmur3_32(), this is true for all Solaris/Unix/Linux and maxosx.
Though lots of files have been touched, but the line of changes are actually relatively small.
The proposed change only deals with the default case-sensitiveness seting, which is case insensitive. On MaxOSX, you actually can configure the HFS+ file system or the mounted vol to be case-sensitive. A possible approach is to have some extra FileStore attributes, such as a isCaseSensitive and to use case sensitive compare/equal on such fs, but this can be dealt with separately later.
Thanks, -Sherman
participants (5)
-
Alan Bateman
-
David Kocher
-
Mike Duigou
-
Naoto Sato
-
Xueming Shen