RFR(S): 8188122: Path length limits on Windows leads to obscure class loading failures
Calvin Cheung
calvin.cheung at oracle.com
Wed Nov 8 17:27:14 UTC 2017
On 11/7/17, 6:12 AM, Thomas Stüfe wrote:
> Hi Calvin,
>
> On Wed, Nov 1, 2017 at 8:07 PM, Calvin Cheung
> <calvin.cheung at oracle.com <mailto:calvin.cheung at oracle.com>> wrote:
>
>
>
> On 10/27/17, 12:55 AM, Thomas Stüfe wrote:
>> Hi Calvin,
>>
>> On Fri, Oct 27, 2017 at 2:03 AM, Calvin Cheung
>> <calvin.cheung at oracle.com <mailto:calvin.cheung at oracle.com>> wrote:
>>
>> Hi Thomas,
>>
>> On 10/25/17, 11:54 PM, Thomas Stüfe wrote:
>>
>> Hi Calvin,
>>
>> this is a worthwhile addition, thank you for your work!
>>
>> Thanks for your review.
>>
>>
>> Thanks for your work :)
>>
>> First off, there is another issue in
>> file_attribute_data_to_stat(). We actually had this issue before,
>> but your work uncovered it:
>>
>> According to POSIX
>> (http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/stat.h.html
>> <http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/stat.h.html>)
>> and every unix manpage I looked at (solaris, linux, aix..),
>> st_ctime is not the file creation time but the last time file
>> status was changed. Only Microsoft gets this wrong in their posix
>> layer, its stat() function returns
>> (https://msdn.microsoft.com/en-us/library/14h5k7ff.aspx
>> <https://msdn.microsoft.com/en-us/library/14h5k7ff.aspx>)
>> creation time in st_ctime.
>>
>> But as os::stat() is a platform independent layer, I'd say we
>> should behave the same on all platforms, and that would be the
>> Posix way.
>>
>> I did not find any code calling os::stat() and using st_ctime, so
>> this is still save to change for windows. (Note that I found that
>> perfMemory_xxx.cpp uses plain OS ::stat and misuses ctime as
>> "creation time" - I opened a bug for that
>> (https://bugs.openjdk.java.net/browse/JDK-8190260
>> <https://bugs.openjdk.java.net/browse/JDK-8190260> - but that
>> does not affect us, as they do not call os::stat()).
>>
>> There is one more complication: in os::stat() we use plain
>> ::stat() in the single byte case.:
>>
>> *+ if (strlen(path) < MAX_PATH) {*
>> *+ ret = ::stat(pathbuf, sbuf);*
>> *+ } else {*
>> *
>> *
>> But ::stat() on Windows is broken, as I wrote above. We should
>> not use it, especially not if we change the meaing of st_ctime in
>> the double byte path.
>>
>> My suggestion would be to just always call GetFileAttributes(),
>> also for the single byte path. A very simple solution would be to
>> just always go the double byte path with UNC translation,
>> regardless of the path length. Or, if you are worried about
>> performance, for paths shorter than MAX_PATH we use
>> GetFileAttributesA(), for longer paths GetFileAttributesW with
>> UNC translation. In both cases you get
>> a WIN32_FILE_ATTRIBUTE_DATA which you can feed tp your
>> file_attribute_data_to_stat() routine and have the struct stat
>> you return be always consistent.
> I ran into an issue with the dwFileAttributes value for a jar
> file. On Windows Server O/S, the value is set to 0x2020 which is
> (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED | FILE_ATTRIBUTE_ARCHIVE) but
> on a desktop O/S like Windows 7, it is set to 0x0020 which is just
> FILE_ATTRIBUTE_ARCHIVE. I've fixed it in
> file_attribute_data_to_stat().
>
>
> Oh.. :( good catch! But that opens a new can of worms I did not see
> before:
>
> FILE_ATTRIBUTE_NORMAL is documented as "A file that does not have
> other attributes set. This attribute is valid only when used alone."
> In addition to this flag, we have a multitude of things like
> FILE_ATTRIBUTE_ARCHIVE, FILE_ATTRIBUTE_ENCRYPTED,
> FILE_ATTRIBUTE_READONLY ... etc, all cases where we assume this is a
> normal file in Unix terminology and where we would expect os::stat to
> return S_IFREG, but where according to the msdn doc
> FILE_ATTRIBUTE_NORMAL is not set.
>
> Looking at what others do in this scenario (Looked at mingw sources
> and at ReactOS - I am not posting any source code here for legal
> reasons but feel free to look for yourself), the usual way to
> translate WIN32_FILE_ATTRIBUTES to struct stat seems to be:
> "if FILE_ATTRIBUTE_DIRECTORY then set S_IFDIR else S_IFREG" (so, no
> dependency on FILE_ATTRIBUTE_NORMAL).
This makes sense but I ran into similar problem as before - the
dwFileAttributes has a different value on a windows server O/S vs
desktop O/S. So I need to do the check as follows:
+ // A directory has the dwFileAttributes value of 0x2010 which is
+ // (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED | FILE_ATTRIBUTE_ARCHIVE)
+ // on Windows Server O/S but the value is 0x0010 on Windows desktop O/S
+ // such as Windows 7.
+ if ((file_data.dwFileAttributes& FILE_ATTRIBUTE_DIRECTORY) != 0) {
+ sbuf->st_mode |= S_IFDIR;
+ } else {
+ sbuf->st_mode |= S_IFREG;
+ }
updated webrev:
http://cr.openjdk.java.net/~ccheung/8188122/webrev.04/
Diff from webrev.03:
< --- old/src/hotspot/os/windows/os_windows.cpp 2017-11-08
08:50:40.170786397 -0800
< +++ new/src/hotspot/os/windows/os_windows.cpp 2017-11-08
08:50:39.803751877 -0800
< @@ -4060,41 +4060,119 @@
---
> --- old/src/hotspot/os/windows/os_windows.cpp 2017-11-01
09:40:13.657460425 -0700
> +++ new/src/hotspot/os/windows/os_windows.cpp 2017-11-01
09:40:13.261423192 -0700
> @@ -4060,41 +4060,121 @@
25,29c25
< + // A directory has the dwFileAttributes value of 0x2010 which is
< + // (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED | FILE_ATTRIBUTE_ARCHIVE)
< + // on Windows Server O/S but the value is 0x0010 on Windows desktop O/S
< + // such as Windows 7.
< + if ((file_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0) {
---
> + if (file_data.dwFileAttributes == FILE_ATTRIBUTE_DIRECTORY) {
31c27,33
< + } else {
---
> + }
> + if ((file_data.dwFileAttributes == FILE_ATTRIBUTE_NORMAL) ||
> + // an archive file such as a jar file has the dwFileAttributes
value of
> + // 0x2020 (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED |
FILE_ATTRIBUTE_ARCHIVE)
> + // on Windows Server O/S but the value is 0x0020 on
> + // Windows desktop O/S such as Windows 7.
> + ((file_data.dwFileAttributes & FILE_ATTRIBUTE_ARCHIVE) != 0)) {
>
> I wonder about other special cases which should work too:
> - read-only files should be S_IFREG and !S_IWRITE,
For a read-only system file under the user's home dir.
st_mode & 0xFF00 = 0x8100 = S_IFREG | S_IREAD
dwFileAttributes = 39 = (FILE_ATTRIBUTE_ARCHIVE | FILE_ATTRIBUTE_HIDDEN
| FILE_ATTRIBUTE_SYSTEM | FILE_ATTRIBUTE_READONLY)
> read-only directories should be S_IFDIR and !S_IWRITE.
I've tried the C:\progra~1 dir.
st_mode & 0xFF00 = 0x4100 = S_IFDIR | S_IREAD
dwFileAttributes = 17 = (FILE_ATTRIBUTE_DIRECTORY | FILE_ATTRIBUTE_READONLY)
> - We should tread the device root ("C:\") as a directory (so,
> os::stat("c:\") should return S_IFDIR). Does this work?
This one works too.
st_mode & 0xFF00 = 0x4100 = S_IFDIR | S_IREAD
dwFileAttributes = 22 = (FILE_ATTRIBUTE_DIRECTORY |
FILE_ATTRIBUTE_HIDDEN | FILE_ATTRIBUTE_SYSTEM)
>
> I've also used GetTickCounts() to measure the performance of
> ::stat() vs GetFileAttributesA() plus
> file_attribute_data_to_stat(). There's no difference at the ms
> resolution. Using the high-resolution timestamp
> (https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx)
> <https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx%29>,
> it doesn't show much difference.
>
>
> stat() seems to be implemented using FindFirstFile() (see crt sources
> if you can), and we call GetFileAttributesA(), so I do not think this
> differs much.
Yes, I saw the same in my Visual Studio installation.
thanks,
Calvin
>
>>
>> But onward:
>>
>>
>>
>> =========================
>>
>> src/hotspot/os/windows/os_windows.cpp
>>
>> Could you please use wchar_t instead of WCHAR?
>>
>> Fixed.
>>
>>
>> ---------------
>> in os::stat():
>>
>> This cumbersome malloc/strcpy sequence:
>>
>> ! size_t path_len = strlen(path) + 1;
>> + char* pathbuf = (char*)os::malloc(path_len *
>> sizeof(char), mtInternal);
>> + if (pathbuf == NULL) {
>> + errno = ENOMEM;
>> return -1;
>> }
>> os::native_path(strcpy(pathbuf, path));
>>
>> can be reduced to a simple strdup:
>>
>> char* pathbuf = os::strdup(path, mtInternal);
>> if (pathbuf == NULL) {
>> errno = ENOMEM;
>> return -1;
>> }
>> os::native_path(pathbuf);
>>
>> This also would separate strcpy() from os::native_path()
>> which is a bit unreadable.
>>
>> I've made the change.
>>
>>
>> --------------------
>>
>> The single-byte path (strdup, ::stat()), together with
>> its free(), can be moved inside the (strlen(path) <
>> MAX_PATH) condition. os::native_path will not change the
>> path length (it better not, as it operates on the input
>> buffer). That avoids having two allocations on the
>> doublebyte path.
>>
>>
>> os::native_path() converts a path like C:\\\\somedir to
>> C:\\somedir
>> So I'll need the converted path in the wchar_t case too. The
>> freeing of the pathbuf is being done at the end of the
>> function or in the middle of the wchar_t case if there's an
>> error.
>>
>>
>> Oh, you are right,of course. I missed that it this is needed for
>> both paths.
>>
>>
>>
>> -----------------------
>>
>> But seeing that translation to UNC paths is something we
>> might want more often, I would combine allocation and UNC
>> prefix adding to one function like this, to avoid the
>> memove and increase reusability:
>>
>> // Creates an unc path from a single byte path. Return
>> buffer is allocated in C heap and needs to be freed by
>> caller. Returns NULL on error.
>> static whchar_t* create_unc_path(const char* s) {
>> - does s start with \\?\ ?
>> - yes:
>> - os::malloc(strlen(s) + 1) and mbstowcs_s
>> - no:
>> - os::malloc(strlen(s) + 1 + 4), mbstowcs_s
>> to fourth position in string, prefix with L"\\?\"
>> }
>>
>>
>> I also include the case for adding L"\\\\?\\UNC\0" at the
>> beginning to be consistent with libjava/canonicalize_md.c.
>>
>>
>> We also need error checking to mbstowcs_s.
>>
>> I've added assert like the following after the call:
>>
>> assert(converted_chars == path_len, "sanity");
>>
>>
>>
>> create_unc_path() :
>>
>> - could you convert the /**/ to // comments, please?
> Fixed.
>
>
> thanks.
>
>
>>
>> - not sure about the assert after mbstowcs_s: if we happen to
>> encounter an invalid multibyte character, function will fail and
>> return an error:
>>
>> https://msdn.microsoft.com/en-us/library/eyktyxsx.aspx
>> <https://msdn.microsoft.com/en-us/library/eyktyxsx.aspx>
>> "If mbstowcs_s encounters an invalid multibyte character, it puts
>> 0 in *``pReturnValue, sets the destination buffer to an empty
>> string, sets errno to EILSEQ, and returns EILSEQ."
> I've changed create_unc_path() so that the caller will get the
> errno and removed the assert.
>
> static wchar_t* create_unc_path(const char* path, errno_t &err)
>
>
> Okay, works for me.
>
>
>>
>> As this is dependent on user data, we should not assert, but
>> handle the return code of mbstowcs_s gracefully. Same goes for
>> libjava/canonicalize_md.c.
>>
>> - Here: ::mbstowcs_s(&converted_chars, &wpath[7], path_len + 7,
>> path, path_len);
>> third parameter is wrong, as we hand in an offset into the
>> buffer, we must decrement the buffer size by this offset, so
>> correct would be path_len +7 - 7 or just path_len.
>>
>> - Same error below: + ::mbstowcs_s(&converted_chars, &wpath[4],
>> path_len + 4, path, path_len);
> Fixed in both places.
>
>
> Okay.
>
>
>>
>>
>> Just for cleanliness, I would then wrap deallocation into
>> an own function.
>>
>> static viud destroy_unc_path(whchar_t* s) { os::free(s); }
>>
>> I've added the destroy function.
>>
>>
>> These two functions could be candidates of putting into
>> os::win32 namespace, as this form of ANSI->UNC path
>> translation is quite common - whoever wants to use the
>> xxxxW() functions must do this.
>>
>> I'm leaving them in os_windows.cpp since they're being used
>> only within that file.
>>
>>
>> Fine by me.
>>
>>
>>
>> -----------------------
>>
>> FindFirstFileW:
>>
>> I am pretty sure that you can achieve the same result
>> with GetFileAttributesExW(). It also returns
>> WIN32_FIND_DATAW.
>>
>> It actually returns WIN32_FILE_ATTRIBUTE_DATA and is very
>> similar to WIN32_FIND_DATAW.
>>
>>
>> It is way more straightforward to use than
>> FindFirstFileW, as it does not require you to write a
>> callback hook.
>>
>> I've switched to using GetFileAttributesExW().
>>
>>
>> Thank you, this is way more readable.
>> Another issue: do we still need the fix for 6539723 if we switch
>> from stat() to GetFileAttributesExW()? This fix looks important,
>> but the comment
>> indicates that it could break things if the original bug is not
>> present.
>>
>> Btw, this is another strong argument for scrapping ::stat()
>> altogether on all code paths, not only for long input paths,
>> because stat() and GetFileAttributesExW() may behave
>> differently. So it would be good to use the same API on all code
>> paths, in order to get the best test coverage.
> For this round of change, I'm using GetFileAttributesEx[A|W]() for
> both code paths.
>
> webrev: http://cr.openjdk.java.net/~ccheung/8188122/webrev.03/
> <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.03/>
>
> thanks,
> Calvin
>
>
> Okay, all good apart from the issues mentioned above. Thanks for your
> work!
>
> Best Regards, Thomas
>
>
>
>>
>>
>>
>> -------
>>
>> eval_find_data(): This is more of a generic helper
>> function, could you rename this to something clearer,
>> e.g. make_double_word() ?
>>
>> Ok. I've renamed it.
>>
>>
>> Also, setup_stat(), could this be renamed more clearly to
>> something like WIN32_FIND_DATA_to_stat? or lowercase if
>> this bothers you :)
>>
>> I'm naming the function as file_attribute_data_to_stat() to
>> match with the data structure name.
>>
>>
>> Thanks for taking my suggestions.
>>
>>
>>
>> ==================================
>> src/hotspot/share/classfile/classLoader.cpp
>>
>> In ClassPathDirEntry::open_stream(), I would feel better
>> if we were asserting _dir and name to be != NULL before
>> feeding it to strlen.
>>
>> I've added an assert statement.
>>
>>
>> ===================
>>
>> Here's an updated webrev:
>> http://cr.openjdk.java.net/~ccheung/8188122/webrev.02/
>> <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.02/>
>>
>>
>> Thanks!
>>
>> Thomas
>>
>> thanks,
>> Calvin
>>
>>
>> Thanks!
>>
>> Thomas
>>
>>
>>
>>
>>
>>
>> On Wed, Oct 25, 2017 at 9:48 PM, Calvin Cheung
>> <calvin.cheung at oracle.com
>> <mailto:calvin.cheung at oracle.com>
>> <mailto:calvin.cheung at oracle.com
>> <mailto:calvin.cheung at oracle.com>>> wrote:
>>
>> I've reworked this fix by using the Unicode version
>> of open
>> (wopen) to handle path name longer than max path with
>> the path
>> prefixed to indicate an extended length path as
>> described in [1].
>>
>> The Unicode version of stat (wstat) doesn't work well
>> with long
>> path [2]. So FindFirstFileW will be used.The data in
>> WIN32_FIND_DATA returned from FindFirstFileW needs to be
>> transferred to the stat structure since the caller
>> expects a
>> return stat structure and other platforms return a
>> stat structure.
>>
>> In classLoader.cpp, calculate the size of buffer
>> required instead
>> of limiting it to JVM_MAXPATHLEN.
>> In os_windows.cpp, dynamically allocate buffers in
>> os::open and
>> os::stat.
>>
>> updated webrev:
>> http://cr.openjdk.java.net/~ccheung/8188122/webrev.01/
>> <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.01/>
>> <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.01/
>> <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.01/>>
>>
>> It passed hs-tier2 testing using mach5.
>>
>> thanks,
>> Calvin
>>
>> [1]
>> https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx#MAX_PATH
>> <https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx#MAX_PATH>
>> <https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx#MAX_PATH
>> <https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx#MAX_PATH>>
>>
>> [2]
>> https://social.msdn.microsoft.com/Forums/vstudio/en-US/3c093ea9-f0aa-446d-b648-2dabe8480430/stat-and-long-names?forum=vcgeneral
>> <https://social.msdn.microsoft.com/Forums/vstudio/en-US/3c093ea9-f0aa-446d-b648-2dabe8480430/stat-and-long-names?forum=vcgeneral>
>> <https://social.msdn.microsoft.com/Forums/vstudio/en-US/3c093ea9-f0aa-446d-b648-2dabe8480430/stat-and-long-names?forum=vcgeneral
>> <https://social.msdn.microsoft.com/Forums/vstudio/en-US/3c093ea9-f0aa-446d-b648-2dabe8480430/stat-and-long-names?forum=vcgeneral>>
>>
>>
>>
>> On 10/16/17, 3:15 PM, Calvin Cheung wrote:
>>
>> JBS:
>> https://bugs.openjdk.java.net/browse/JDK-8188122
>> <https://bugs.openjdk.java.net/browse/JDK-8188122>
>> <https://bugs.openjdk.java.net/browse/JDK-8188122
>> <https://bugs.openjdk.java.net/browse/JDK-8188122>>
>>
>> Adding a warning message if the full path or the
>> directory
>> length exceeds MAX_PATH on windows.
>>
>> Example warning messages.
>>
>> 1) The full path exceeds MAX_PATH:
>>
>> Java HotSpot(TM) 64-Bit Server VM warning:
>> construct full path
>> name failed: total length 270 exceeds max length
>> of 260
>> dir
>>
>> T:\\testoutput\\jtreg\\JTwork\\classes\\2\\runtime\\LoadClass\\LongPath.d\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>> length 259
>> name Hello.class length 11
>>
>> 2) The directory path exceeds MAX_PATH:
>>
>> Java HotSpot(TM) 64-Bit Server VM warning:
>> os::stat failed:
>> path length 265 exceeds max length 260
>> path
>>
>> T:\\testoutput\\jtreg\\JTwork\\classes\\2\\runtime\\LoadClass\\LongPath.d\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\\xxxxx
>>
>> webrev:
>> http://cr.openjdk.java.net/~ccheung/8188122/webrev.00/
>> <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.00/>
>> <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.00/
>> <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.00/>>
>>
>> Testing:
>> JPRT (including the new test)
>>
>> thanks,
>> Calvin
>>
>>
>>
>
More information about the hotspot-runtime-dev
mailing list