RFR(S): 8188122: Path length limits on Windows leads to obscure class loading failures

Calvin Cheung calvin.cheung at oracle.com
Wed Nov 8 17:27:14 UTC 2017



On 11/7/17, 6:12 AM, Thomas Stüfe wrote:
> Hi Calvin,
>
> On Wed, Nov 1, 2017 at 8:07 PM, Calvin Cheung 
> <calvin.cheung at oracle.com <mailto:calvin.cheung at oracle.com>> wrote:
>
>
>
>     On 10/27/17, 12:55 AM, Thomas Stüfe wrote:
>>     Hi Calvin,
>>
>>     On Fri, Oct 27, 2017 at 2:03 AM, Calvin Cheung
>>     <calvin.cheung at oracle.com <mailto:calvin.cheung at oracle.com>> wrote:
>>
>>         Hi Thomas,
>>
>>         On 10/25/17, 11:54 PM, Thomas Stüfe wrote:
>>
>>             Hi Calvin,
>>
>>             this is a worthwhile addition, thank you for your work!
>>
>>         Thanks for your review.
>>
>>
>>     Thanks for your work :)
>>
>>     First off, there is another issue in
>>     file_attribute_data_to_stat(). We actually had this issue before,
>>     but your work uncovered it:
>>
>>     According to POSIX
>>     (http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/stat.h.html
>>     <http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/stat.h.html>)
>>     and every unix manpage I looked at (solaris, linux, aix..),
>>     st_ctime is not the file creation time but the last time file
>>     status was changed. Only Microsoft gets this wrong in their posix
>>     layer, its stat() function returns
>>      (https://msdn.microsoft.com/en-us/library/14h5k7ff.aspx
>>     <https://msdn.microsoft.com/en-us/library/14h5k7ff.aspx>)
>>     creation time in st_ctime.
>>
>>     But as os::stat() is a platform independent layer, I'd say we
>>     should behave the same on all platforms, and that would be the
>>     Posix way.
>>
>>     I did not find any code calling os::stat() and using st_ctime, so
>>     this is still save to change for windows. (Note that I found that
>>     perfMemory_xxx.cpp uses plain OS ::stat and misuses ctime as
>>     "creation time" - I opened a bug for that
>>     (https://bugs.openjdk.java.net/browse/JDK-8190260
>>     <https://bugs.openjdk.java.net/browse/JDK-8190260> - but that
>>     does not affect us, as they do not call os::stat()).
>>
>>     There is one more complication: in os::stat() we use plain
>>     ::stat() in the single byte case.:
>>
>>     *+ if (strlen(path) < MAX_PATH) {*
>>     *+     ret = ::stat(pathbuf, sbuf);*
>>     *+   } else {*
>>     *
>>     *
>>     But ::stat() on Windows is broken, as I wrote above. We should
>>     not use it, especially not if we change the meaing of st_ctime in
>>     the double byte path.
>>
>>     My suggestion would be to just always call GetFileAttributes(),
>>     also for the single byte path. A very simple solution would be to
>>     just always go the double byte path with UNC translation,
>>     regardless of the path length. Or, if you are worried about
>>     performance, for paths shorter than MAX_PATH we use
>>     GetFileAttributesA(), for longer paths GetFileAttributesW with
>>     UNC translation. In both cases you get
>>     a WIN32_FILE_ATTRIBUTE_DATA which you can feed tp your 
>>     file_attribute_data_to_stat() routine and have the struct stat
>>     you return be always consistent.
>     I ran into an issue with the dwFileAttributes value for a jar
>     file. On Windows Server O/S, the value is set to 0x2020 which is
>     (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED | FILE_ATTRIBUTE_ARCHIVE) but
>     on a desktop O/S like Windows 7, it is set to 0x0020 which is just
>     FILE_ATTRIBUTE_ARCHIVE. I've fixed it in
>     file_attribute_data_to_stat().
>
>
> Oh.. :( good catch! But that opens a new can of worms I did not see 
> before:
>
> FILE_ATTRIBUTE_NORMAL is documented as "A file that does not have 
> other attributes set. This attribute is valid only when used alone." 
> In addition to this flag, we have a multitude of things like 
> FILE_ATTRIBUTE_ARCHIVE, FILE_ATTRIBUTE_ENCRYPTED, 
> FILE_ATTRIBUTE_READONLY ... etc, all cases where we assume this is a 
> normal file in Unix terminology and where we would expect os::stat to 
> return S_IFREG, but where according to the msdn doc 
> FILE_ATTRIBUTE_NORMAL is not set.
>
> Looking at what others do in this scenario (Looked at mingw sources 
> and at ReactOS - I am not posting any source code here for legal 
> reasons but feel free to look for yourself), the usual way to 
> translate WIN32_FILE_ATTRIBUTES to struct stat seems to be:
> "if FILE_ATTRIBUTE_DIRECTORY then set S_IFDIR else S_IFREG" (so, no 
> dependency on FILE_ATTRIBUTE_NORMAL).
This makes sense but I ran into similar problem as before - the 
dwFileAttributes has a different value on a windows server O/S vs 
desktop O/S. So I need to do the check as follows:

+   // A directory has the dwFileAttributes value of 0x2010 which is
+   // (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED | FILE_ATTRIBUTE_ARCHIVE)
+   // on Windows Server O/S but the value is 0x0010 on Windows desktop O/S
+   // such as Windows 7.
+   if ((file_data.dwFileAttributes&  FILE_ATTRIBUTE_DIRECTORY) != 0) {
+     sbuf->st_mode |= S_IFDIR;
+   } else {
+     sbuf->st_mode |= S_IFREG;
+   }

updated webrev:
     http://cr.openjdk.java.net/~ccheung/8188122/webrev.04/

Diff from webrev.03:

< --- old/src/hotspot/os/windows/os_windows.cpp 2017-11-08 
08:50:40.170786397 -0800
< +++ new/src/hotspot/os/windows/os_windows.cpp 2017-11-08 
08:50:39.803751877 -0800
< @@ -4060,41 +4060,119 @@
---
 > --- old/src/hotspot/os/windows/os_windows.cpp 2017-11-01 
09:40:13.657460425 -0700
 > +++ new/src/hotspot/os/windows/os_windows.cpp 2017-11-01 
09:40:13.261423192 -0700
 > @@ -4060,41 +4060,121 @@
25,29c25
< +  // A directory has the dwFileAttributes value of 0x2010 which is
< +  // (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED | FILE_ATTRIBUTE_ARCHIVE)
< +  // on Windows Server O/S but the value is 0x0010 on Windows desktop O/S
< +  // such as Windows 7.
< +  if ((file_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0) {
---
 > +  if (file_data.dwFileAttributes == FILE_ATTRIBUTE_DIRECTORY) {
31c27,33
< +  } else {
---
 > +  }
 > +  if ((file_data.dwFileAttributes == FILE_ATTRIBUTE_NORMAL) ||
 > +      // an archive file such as a jar file has the dwFileAttributes 
value of
 > +      // 0x2020 (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED | 
FILE_ATTRIBUTE_ARCHIVE)
 > +      // on Windows Server O/S but the value is 0x0020 on
 > +      // Windows desktop O/S such as Windows 7.
 > +      ((file_data.dwFileAttributes & FILE_ATTRIBUTE_ARCHIVE) != 0)) {

>
> I wonder about other special cases which should work too:
> - read-only files should be S_IFREG and !S_IWRITE,
For a read-only system file under the user's home dir.

st_mode & 0xFF00 = 0x8100 = S_IFREG | S_IREAD
dwFileAttributes = 39 = (FILE_ATTRIBUTE_ARCHIVE | FILE_ATTRIBUTE_HIDDEN 
| FILE_ATTRIBUTE_SYSTEM | FILE_ATTRIBUTE_READONLY)
> read-only directories should be S_IFDIR and !S_IWRITE.
I've tried the C:\progra~1 dir.

st_mode & 0xFF00 = 0x4100 = S_IFDIR | S_IREAD
dwFileAttributes = 17 = (FILE_ATTRIBUTE_DIRECTORY | FILE_ATTRIBUTE_READONLY)
> - We should tread the device root ("C:\") as a directory (so, 
> os::stat("c:\") should return S_IFDIR). Does this work?
This one works too.

st_mode & 0xFF00 = 0x4100 = S_IFDIR | S_IREAD
dwFileAttributes = 22 = (FILE_ATTRIBUTE_DIRECTORY | 
FILE_ATTRIBUTE_HIDDEN | FILE_ATTRIBUTE_SYSTEM)
>
>     I've also used GetTickCounts() to measure the performance of
>     ::stat() vs GetFileAttributesA() plus
>     file_attribute_data_to_stat(). There's no difference at the ms
>     resolution. Using the high-resolution timestamp
>     (https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx)
>     <https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx%29>,
>     it doesn't show much difference.
>
>
> stat() seems to be implemented using FindFirstFile() (see crt sources 
> if you can), and we call GetFileAttributesA(), so I do not think this 
> differs much.
Yes, I saw the same in my Visual Studio installation.

thanks,
Calvin
>
>>
>>     But onward:
>>
>>
>>
>>             =========================
>>
>>             src/hotspot/os/windows/os_windows.cpp
>>
>>             Could you please use wchar_t instead of WCHAR?
>>
>>         Fixed.
>>
>>
>>             ---------------
>>             in os::stat():
>>
>>             This cumbersome malloc/strcpy sequence:
>>
>>             !   size_t path_len = strlen(path) + 1;
>>             +   char* pathbuf = (char*)os::malloc(path_len *
>>             sizeof(char), mtInternal);
>>             +   if (pathbuf == NULL) {
>>             +     errno = ENOMEM;
>>                   return -1;
>>                 }
>>                 os::native_path(strcpy(pathbuf, path));
>>
>>              can be reduced to a simple strdup:
>>
>>             char* pathbuf = os::strdup(path, mtInternal);
>>             if (pathbuf == NULL) {
>>               errno = ENOMEM;
>>               return -1;
>>             }
>>             os::native_path(pathbuf);
>>
>>             This also would separate strcpy() from os::native_path()
>>             which is a bit unreadable.
>>
>>         I've made the change.
>>
>>
>>             --------------------
>>
>>             The single-byte path (strdup, ::stat()), together with
>>             its free(), can be moved inside the (strlen(path) <
>>             MAX_PATH) condition. os::native_path will not change the
>>             path length (it better not, as it operates on the input
>>             buffer). That avoids  having two allocations on the
>>             doublebyte path.
>>
>>
>>         os::native_path() converts a path like C:\\\\somedir to
>>         C:\\somedir
>>         So I'll need the converted path in the wchar_t case too. The
>>         freeing of the pathbuf is being done at the end of the
>>         function or in the middle of the wchar_t case if there's an
>>         error.
>>
>>
>>     Oh, you are right,of course. I missed that it this is needed for
>>     both paths.
>>
>>
>>
>>             -----------------------
>>
>>             But seeing that translation to UNC paths is something we
>>             might want more often, I would combine allocation and UNC
>>             prefix adding to one function like this, to avoid the
>>             memove and increase reusability:
>>
>>             // Creates an unc path from a single byte path. Return
>>             buffer is allocated in C heap and needs to be freed by
>>             caller. Returns NULL on error.
>>             static whchar_t* create_unc_path(const char* s) {
>>               - does s start with \\?\ ?
>>             - yes:
>>                         - os::malloc(strlen(s) + 1) and mbstowcs_s
>>                     - no:
>>                         - os::malloc(strlen(s) + 1 + 4), mbstowcs_s
>>             to fourth position in string, prefix with L"\\?\"
>>             }
>>
>>
>>         I also include the case for adding  L"\\\\?\\UNC\0" at the
>>         beginning to be consistent with libjava/canonicalize_md.c. 
>>
>>
>>             We also need error checking to mbstowcs_s.
>>
>>         I've added assert like the following after the call:
>>
>>         assert(converted_chars == path_len, "sanity");
>>
>>
>>
>>     create_unc_path() :
>>
>>     - could you convert the /**/ to // comments, please?
>     Fixed.
>
>
> thanks.
>
>
>>
>>     - not sure about the assert after mbstowcs_s: if we happen to
>>     encounter an invalid multibyte character, function will fail and
>>     return an error:
>>
>>     https://msdn.microsoft.com/en-us/library/eyktyxsx.aspx
>>     <https://msdn.microsoft.com/en-us/library/eyktyxsx.aspx>
>>     "If mbstowcs_s encounters an invalid multibyte character, it puts
>>     0 in *``pReturnValue, sets the destination buffer to an empty
>>     string, sets errno to EILSEQ, and returns EILSEQ."
>     I've changed create_unc_path() so that the caller will get the
>     errno and removed the assert.
>
>     static wchar_t* create_unc_path(const char* path, errno_t &err)
>
>
> Okay, works for me.
>
>
>>
>>      As this is dependent on user data, we should not assert, but
>>     handle the return code of mbstowcs_s gracefully. Same goes for
>>     libjava/canonicalize_md.c.
>>
>>     - Here: ::mbstowcs_s(&converted_chars, &wpath[7], path_len + 7,
>>     path, path_len);
>>     third parameter is wrong, as we hand in an offset into the
>>     buffer, we must decrement the buffer size by this offset, so
>>     correct would be path_len +7 - 7 or just path_len.
>>
>>     - Same error below: + ::mbstowcs_s(&converted_chars, &wpath[4],
>>     path_len + 4, path, path_len);
>     Fixed in both places.
>
>
> Okay.
>
>
>>
>>
>>             Just for cleanliness, I would then wrap deallocation into
>>             an own function.
>>
>>             static viud destroy_unc_path(whchar_t* s) { os::free(s); }
>>
>>         I've added the destroy function.
>>
>>
>>             These two functions could be candidates of putting into
>>             os::win32 namespace, as this form of ANSI->UNC path
>>             translation is quite common - whoever wants to use the
>>             xxxxW() functions must do this.
>>
>>         I'm leaving them in os_windows.cpp since they're being used
>>         only within that file.
>>
>>
>>     Fine by me.
>>
>>
>>
>>             -----------------------
>>
>>             FindFirstFileW:
>>
>>             I am pretty sure that you can achieve the same result
>>             with GetFileAttributesExW(). It also returns
>>             WIN32_FIND_DATAW.
>>
>>         It actually returns WIN32_FILE_ATTRIBUTE_DATA and is very
>>         similar to WIN32_FIND_DATAW.
>>
>>
>>             It is way more straightforward to use than
>>             FindFirstFileW, as it does not require you to write a
>>             callback hook.
>>
>>         I've switched to using GetFileAttributesExW().
>>
>>
>>     Thank you, this is way more readable.
>>     Another issue: do we still need the fix for 6539723 if we switch
>>     from stat() to GetFileAttributesExW()? This fix looks important,
>>     but the comment
>>     indicates that it could break things if the original bug is not
>>     present.
>>
>>     Btw, this is another strong argument for scrapping ::stat()
>>     altogether on all code paths, not only for long input paths,
>>     because stat() and GetFileAttributesExW() may behave
>>     differently. So it would be good to use the same API on all code
>>     paths, in order to get the best test coverage.
>     For this round of change, I'm using GetFileAttributesEx[A|W]() for
>     both code paths.
>
>     webrev: http://cr.openjdk.java.net/~ccheung/8188122/webrev.03/
>     <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.03/>
>
>     thanks,
>     Calvin
>
>
> Okay, all good apart from the issues mentioned above. Thanks for your 
> work!
>
> Best Regards, Thomas
>
>
>
>>
>>
>>
>>             -------
>>
>>             eval_find_data(): This is more of a generic helper
>>             function, could you rename this to something clearer,
>>             e.g. make_double_word() ?
>>
>>         Ok. I've renamed it.
>>
>>
>>             Also, setup_stat(), could this be renamed more clearly to
>>             something like WIN32_FIND_DATA_to_stat? or lowercase if
>>             this bothers you :)
>>
>>         I'm naming the function as file_attribute_data_to_stat() to
>>         match with the data structure name.
>>
>>
>>     Thanks for taking my suggestions.
>>
>>
>>
>>             ==================================
>>             src/hotspot/share/classfile/classLoader.cpp
>>
>>             In ClassPathDirEntry::open_stream(), I would feel better
>>             if we were asserting _dir and name to be != NULL before
>>             feeding it to strlen.
>>
>>         I've added an assert statement.
>>
>>
>>             ===================
>>
>>         Here's an updated webrev:
>>         http://cr.openjdk.java.net/~ccheung/8188122/webrev.02/
>>         <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.02/>
>>
>>
>>     Thanks!
>>
>>     Thomas
>>
>>         thanks,
>>         Calvin
>>
>>
>>             Thanks!
>>
>>             Thomas
>>
>>
>>
>>
>>
>>
>>             On Wed, Oct 25, 2017 at 9:48 PM, Calvin Cheung
>>             <calvin.cheung at oracle.com
>>             <mailto:calvin.cheung at oracle.com>
>>             <mailto:calvin.cheung at oracle.com
>>             <mailto:calvin.cheung at oracle.com>>> wrote:
>>
>>                 I've reworked this fix by using the Unicode version
>>             of open
>>                 (wopen) to handle path name longer than max path with
>>             the path
>>                 prefixed to indicate an extended length path as
>>             described in [1].
>>
>>                 The Unicode version of stat (wstat) doesn't work well
>>             with long
>>                 path [2]. So FindFirstFileW will be used.The data in
>>                 WIN32_FIND_DATA returned from FindFirstFileW needs to be
>>                 transferred to the stat structure since the caller
>>             expects a
>>                 return stat structure and other platforms return a
>>             stat structure.
>>
>>                 In classLoader.cpp, calculate the size of buffer
>>             required instead
>>                 of limiting it to JVM_MAXPATHLEN.
>>                 In os_windows.cpp, dynamically allocate buffers in
>>             os::open and
>>                 os::stat.
>>
>>                 updated webrev:
>>             http://cr.openjdk.java.net/~ccheung/8188122/webrev.01/
>>             <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.01/>
>>             <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.01/
>>             <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.01/>>
>>
>>                 It passed hs-tier2 testing using mach5.
>>
>>                 thanks,
>>                 Calvin
>>
>>                 [1]
>>             https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx#MAX_PATH
>>             <https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx#MAX_PATH>
>>             <https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx#MAX_PATH
>>             <https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx#MAX_PATH>>
>>
>>                 [2]
>>             https://social.msdn.microsoft.com/Forums/vstudio/en-US/3c093ea9-f0aa-446d-b648-2dabe8480430/stat-and-long-names?forum=vcgeneral
>>             <https://social.msdn.microsoft.com/Forums/vstudio/en-US/3c093ea9-f0aa-446d-b648-2dabe8480430/stat-and-long-names?forum=vcgeneral>
>>             <https://social.msdn.microsoft.com/Forums/vstudio/en-US/3c093ea9-f0aa-446d-b648-2dabe8480430/stat-and-long-names?forum=vcgeneral
>>             <https://social.msdn.microsoft.com/Forums/vstudio/en-US/3c093ea9-f0aa-446d-b648-2dabe8480430/stat-and-long-names?forum=vcgeneral>>
>>
>>
>>
>>                 On 10/16/17, 3:15 PM, Calvin Cheung wrote:
>>
>>                     JBS:
>>             https://bugs.openjdk.java.net/browse/JDK-8188122
>>             <https://bugs.openjdk.java.net/browse/JDK-8188122>
>>             <https://bugs.openjdk.java.net/browse/JDK-8188122
>>             <https://bugs.openjdk.java.net/browse/JDK-8188122>>
>>
>>                     Adding a warning message if the full path or the
>>             directory
>>                     length exceeds MAX_PATH on windows.
>>
>>                     Example warning messages.
>>
>>                     1) The full path exceeds MAX_PATH:
>>
>>                     Java HotSpot(TM) 64-Bit Server VM warning:
>>             construct full path
>>                     name failed: total length 270 exceeds max length
>>             of 260
>>                         dir
>>                    
>>             T:\\testoutput\\jtreg\\JTwork\\classes\\2\\runtime\\LoadClass\\LongPath.d\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>                     length 259
>>                         name Hello.class length 11
>>
>>                     2) The directory path exceeds MAX_PATH:
>>
>>                     Java HotSpot(TM) 64-Bit Server VM warning:
>>             os::stat failed:
>>                     path length 265 exceeds max length 260
>>                         path
>>                    
>>             T:\\testoutput\\jtreg\\JTwork\\classes\\2\\runtime\\LoadClass\\LongPath.d\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\\xxxxx
>>
>>                     webrev:
>>             http://cr.openjdk.java.net/~ccheung/8188122/webrev.00/
>>             <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.00/>
>>             <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.00/
>>             <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.00/>>
>>
>>                     Testing:
>>                         JPRT (including the new test)
>>
>>                     thanks,
>>                     Calvin
>>
>>
>>
>


More information about the hotspot-runtime-dev mailing list