RFR(S): 8188122: Path length limits on Windows leads to obscure class loading failures
Thomas Stüfe
thomas.stuefe at gmail.com
Tue Nov 7 14:12:17 UTC 2017
Hi Calvin,
On Wed, Nov 1, 2017 at 8:07 PM, Calvin Cheung <calvin.cheung at oracle.com>
wrote:
>
>
> On 10/27/17, 12:55 AM, Thomas Stüfe wrote:
>
> Hi Calvin,
>
> On Fri, Oct 27, 2017 at 2:03 AM, Calvin Cheung <calvin.cheung at oracle.com>
> wrote:
>
>> Hi Thomas,
>>
>> On 10/25/17, 11:54 PM, Thomas Stüfe wrote:
>>
>>> Hi Calvin,
>>>
>>> this is a worthwhile addition, thank you for your work!
>>>
>> Thanks for your review.
>
>
> Thanks for your work :)
>
> First off, there is another issue in file_attribute_data_to_stat(). We
> actually had this issue before, but your work uncovered it:
>
> According to POSIX (http://pubs.opengroup.org/
> onlinepubs/009695399/basedefs/sys/stat.h.html) and every unix manpage I
> looked at (solaris, linux, aix..), st_ctime is not the file creation time
> but the last time file status was changed. Only Microsoft gets this wrong
> in their posix layer, its stat() function returns (
> https://msdn.microsoft.com/en-us/library/14h5k7ff.aspx) creation time in
> st_ctime.
>
> But as os::stat() is a platform independent layer, I'd say we should
> behave the same on all platforms, and that would be the Posix way.
>
> I did not find any code calling os::stat() and using st_ctime, so this is
> still save to change for windows. (Note that I found that
> perfMemory_xxx.cpp uses plain OS ::stat and misuses ctime as "creation
> time" - I opened a bug for that (https://bugs.openjdk.java.
> net/browse/JDK-8190260 - but that does not affect us, as they do not call
> os::stat()).
>
> There is one more complication: in os::stat() we use plain ::stat() in the
> single byte case.:
>
> *+ if (strlen(path) < MAX_PATH) {*
>
> *+ ret = ::stat(pathbuf, sbuf);**+ } else {*
>
>
> But ::stat() on Windows is broken, as I wrote above. We should not use it,
> especially not if we change the meaing of st_ctime in the double byte path.
>
> My suggestion would be to just always call GetFileAttributes(), also for
> the single byte path. A very simple solution would be to just always go the
> double byte path with UNC translation, regardless of the path length. Or,
> if you are worried about performance, for paths shorter than MAX_PATH we
> use GetFileAttributesA(), for longer paths GetFileAttributesW with UNC
> translation. In both cases you get a WIN32_FILE_ATTRIBUTE_DATA which you
> can feed tp your file_attribute_data_to_stat() routine and have the struct
> stat you return be always consistent.
>
> I ran into an issue with the dwFileAttributes value for a jar file. On
> Windows Server O/S, the value is set to 0x2020 which is
> (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED | FILE_ATTRIBUTE_ARCHIVE) but on a
> desktop O/S like Windows 7, it is set to 0x0020 which is just
> FILE_ATTRIBUTE_ARCHIVE. I've fixed it in file_attribute_data_to_stat().
>
>
Oh.. :( good catch! But that opens a new can of worms I did not see before:
FILE_ATTRIBUTE_NORMAL is documented as "A file that does not have other
attributes set. This attribute is valid only when used alone." In addition
to this flag, we have a multitude of things like FILE_ATTRIBUTE_ARCHIVE,
FILE_ATTRIBUTE_ENCRYPTED, FILE_ATTRIBUTE_READONLY ... etc, all cases where
we assume this is a normal file in Unix terminology and where we would
expect os::stat to return S_IFREG, but where according to the msdn doc
FILE_ATTRIBUTE_NORMAL is not set.
Looking at what others do in this scenario (Looked at mingw sources and at
ReactOS - I am not posting any source code here for legal reasons but feel
free to look for yourself), the usual way to translate
WIN32_FILE_ATTRIBUTES to struct stat seems to be:
"if FILE_ATTRIBUTE_DIRECTORY then set S_IFDIR else S_IFREG" (so, no
dependency on FILE_ATTRIBUTE_NORMAL).
I wonder about other special cases which should work too:
- read-only files should be S_IFREG and !S_IWRITE, read-only directories
should be S_IFDIR and !S_IWRITE.
- We should tread the device root ("C:\") as a directory (so,
os::stat("c:\") should return S_IFDIR). Does this work?
> I've also used GetTickCounts() to measure the performance of ::stat() vs
> GetFileAttributesA() plus file_attribute_data_to_stat(). There's no
> difference at the ms resolution. Using the high-resolution timestamp (
> https://msdn.microsoft.com/en-us/library/windows/desktop/
> dn553408(v=vs.85).aspx), it doesn't show much difference.
>
>
stat() seems to be implemented using FindFirstFile() (see crt sources if
you can), and we call GetFileAttributesA(), so I do not think this differs
much.
>
> But onward:
>
>
>>
>>> =========================
>>>
>>> src/hotspot/os/windows/os_windows.cpp
>>>
>>> Could you please use wchar_t instead of WCHAR?
>>>
>> Fixed.
>>
>>>
>>> ---------------
>>> in os::stat():
>>>
>>> This cumbersome malloc/strcpy sequence:
>>>
>>> ! size_t path_len = strlen(path) + 1;
>>> + char* pathbuf = (char*)os::malloc(path_len * sizeof(char),
>>> mtInternal);
>>> + if (pathbuf == NULL) {
>>> + errno = ENOMEM;
>>> return -1;
>>> }
>>> os::native_path(strcpy(pathbuf, path));
>>>
>>> can be reduced to a simple strdup:
>>>
>>> char* pathbuf = os::strdup(path, mtInternal);
>>> if (pathbuf == NULL) {
>>> errno = ENOMEM;
>>> return -1;
>>> }
>>> os::native_path(pathbuf);
>>>
>>> This also would separate strcpy() from os::native_path() which is a bit
>>> unreadable.
>>>
>> I've made the change.
>>
>>>
>>> --------------------
>>>
>>> The single-byte path (strdup, ::stat()), together with its free(), can
>>> be moved inside the (strlen(path) < MAX_PATH) condition. os::native_path
>>> will not change the path length (it better not, as it operates on the input
>>> buffer). That avoids having two allocations on the doublebyte path.
>>>
>>
>> os::native_path() converts a path like C:\\\\somedir to C:\\somedir
>> So I'll need the converted path in the wchar_t case too. The freeing of
>> the pathbuf is being done at the end of the function or in the middle of
>> the wchar_t case if there's an error.
>
>
> Oh, you are right,of course. I missed that it this is needed for both
> paths.
>
>
>>
>>
>>> -----------------------
>>>
>>> But seeing that translation to UNC paths is something we might want more
>>> often, I would combine allocation and UNC prefix adding to one function
>>> like this, to avoid the memove and increase reusability:
>>>
>>> // Creates an unc path from a single byte path. Return buffer is
>>> allocated in C heap and needs to be freed by caller. Returns NULL on error.
>>> static whchar_t* create_unc_path(const char* s) {
>>> - does s start with \\?\ ?
>>> - yes:
>>> - os::malloc(strlen(s) + 1) and mbstowcs_s
>>> - no:
>>> - os::malloc(strlen(s) + 1 + 4), mbstowcs_s to fourth
>>> position in string, prefix with L"\\?\"
>>> }
>>>
>>
>> I also include the case for adding L"\\\\?\\UNC\0" at the beginning to
>> be consistent with libjava/canonicalize_md.c.
>
>
>>> We also need error checking to mbstowcs_s.
>>>
>> I've added assert like the following after the call:
>>
>> assert(converted_chars == path_len, "sanity");
>
>
>
> create_unc_path() :
>
> - could you convert the /**/ to // comments, please?
>
> Fixed.
>
thanks.
>
>
> - not sure about the assert after mbstowcs_s: if we happen to encounter an
> invalid multibyte character, function will fail and return an error:
>
> https://msdn.microsoft.com/en-us/library/eyktyxsx.aspx
> "If mbstowcs_s encounters an invalid multibyte character, it puts 0 in
> *``pReturnValue, sets the destination buffer to an empty string, sets errno
> to EILSEQ, and returns EILSEQ."
>
> I've changed create_unc_path() so that the caller will get the errno and
> removed the assert.
>
> static wchar_t* create_unc_path(const char* path, errno_t &err)
>
Okay, works for me.
>
>
> As this is dependent on user data, we should not assert, but handle the
> return code of mbstowcs_s gracefully. Same goes for
> libjava/canonicalize_md.c.
>
> - Here: ::mbstowcs_s(&converted_chars, &wpath[7], path_len + 7, path,
> path_len);
> third parameter is wrong, as we hand in an offset into the buffer, we must
> decrement the buffer size by this offset, so correct would be path_len +7 -
> 7 or just path_len.
>
> - Same error below: + ::mbstowcs_s(&converted_chars, &wpath[4], path_len +
> 4, path, path_len);
>
> Fixed in both places.
>
Okay.
>
>
>
>
>>
>>
>>> Just for cleanliness, I would then wrap deallocation into an own
>>> function.
>>>
>>> static viud destroy_unc_path(whchar_t* s) { os::free(s); }
>>>
>> I've added the destroy function.
>>
>>>
>>> These two functions could be candidates of putting into os::win32
>>> namespace, as this form of ANSI->UNC path translation is quite common -
>>> whoever wants to use the xxxxW() functions must do this.
>>>
>> I'm leaving them in os_windows.cpp since they're being used only within
>> that file.
>
>
> Fine by me.
>
>
>>
>>
>>> -----------------------
>>>
>>> FindFirstFileW:
>>>
>>> I am pretty sure that you can achieve the same result with
>>> GetFileAttributesExW(). It also returns WIN32_FIND_DATAW.
>>>
>> It actually returns WIN32_FILE_ATTRIBUTE_DATA and is very similar to
>> WIN32_FIND_DATAW.
>>
>>>
>>> It is way more straightforward to use than FindFirstFileW, as it does
>>> not require you to write a callback hook.
>>>
>> I've switched to using GetFileAttributesExW().
>
>
> Thank you, this is way more readable.
>
> Another issue: do we still need the fix for 6539723 if we switch from
> stat() to GetFileAttributesExW()? This fix looks important, but the
> comment
> indicates that it could break things if the original bug is not present.
>
> Btw, this is another strong argument for scrapping ::stat() altogether on
> all code paths, not only for long input paths, because stat() and
> GetFileAttributesExW() may behave
> differently. So it would be good to use the same API on all code paths, in
> order to get the best test coverage.
>
> For this round of change, I'm using GetFileAttributesEx[A|W]() for both
> code paths.
>
> webrev: http://cr.openjdk.java.net/~ccheung/8188122/webrev.03/
>
> thanks,
> Calvin
>
Okay, all good apart from the issues mentioned above. Thanks for your work!
Best Regards, Thomas
>
>
>>
>>> -------
>>>
>>> eval_find_data(): This is more of a generic helper function, could you
>>> rename this to something clearer, e.g. make_double_word() ?
>>>
>> Ok. I've renamed it.
>>
>>>
>>> Also, setup_stat(), could this be renamed more clearly to something like
>>> WIN32_FIND_DATA_to_stat? or lowercase if this bothers you :)
>>>
>> I'm naming the function as file_attribute_data_to_stat() to match with
>> the data structure name.
>
>
> Thanks for taking my suggestions.
>
>
>>
>>
>>> ==================================
>>> src/hotspot/share/classfile/classLoader.cpp
>>>
>>> In ClassPathDirEntry::open_stream(), I would feel better if we were
>>> asserting _dir and name to be != NULL before feeding it to strlen.
>>>
>> I've added an assert statement.
>>
>>>
>>> ===================
>>>
>> Here's an updated webrev:
>> http://cr.openjdk.java.net/~ccheung/8188122/webrev.02/
>>
>>
> Thanks!
>
> Thomas
>
>
>> thanks,
>> Calvin
>>
>>
>>> Thanks!
>>>
>>> Thomas
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Oct 25, 2017 at 9:48 PM, Calvin Cheung <calvin.cheung at oracle.com
>>> <mailto:calvin.cheung at oracle.com>> wrote:
>>>
>>> I've reworked this fix by using the Unicode version of open
>>> (wopen) to handle path name longer than max path with the path
>>> prefixed to indicate an extended length path as described in [1].
>>>
>>> The Unicode version of stat (wstat) doesn't work well with long
>>> path [2]. So FindFirstFileW will be used.The data in
>>> WIN32_FIND_DATA returned from FindFirstFileW needs to be
>>> transferred to the stat structure since the caller expects a
>>> return stat structure and other platforms return a stat structure.
>>>
>>> In classLoader.cpp, calculate the size of buffer required instead
>>> of limiting it to JVM_MAXPATHLEN.
>>> In os_windows.cpp, dynamically allocate buffers in os::open and
>>> os::stat.
>>>
>>> updated webrev:
>>> http://cr.openjdk.java.net/~ccheung/8188122/webrev.01/
>>> <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.01/>
>>>
>>> It passed hs-tier2 testing using mach5.
>>>
>>> thanks,
>>> Calvin
>>>
>>> [1]
>>> https://msdn.microsoft.com/en-us/library/windows/desktop/aa3
>>> 65247(v=vs.85).aspx#MAX_PATH
>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/aa
>>> 365247%28v=vs.85%29.aspx#MAX_PATH>
>>>
>>> [2]
>>> https://social.msdn.microsoft.com/Forums/vstudio/en-US/3c093
>>> ea9-f0aa-446d-b648-2dabe8480430/stat-and-long-names?forum=vcgeneral
>>> <https://social.msdn.microsoft.com/Forums/vstudio/en-US/3c09
>>> 3ea9-f0aa-446d-b648-2dabe8480430/stat-and-long-names?forum=vcgeneral>
>>>
>>>
>>>
>>> On 10/16/17, 3:15 PM, Calvin Cheung wrote:
>>>
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8188122
>>> <https://bugs.openjdk.java.net/browse/JDK-8188122>
>>>
>>> Adding a warning message if the full path or the directory
>>> length exceeds MAX_PATH on windows.
>>>
>>> Example warning messages.
>>>
>>> 1) The full path exceeds MAX_PATH:
>>>
>>> Java HotSpot(TM) 64-Bit Server VM warning: construct full path
>>> name failed: total length 270 exceeds max length of 260
>>> dir
>>> T:\\testoutput\\jtreg\\JTwork\\classes\\2\\runtime\\LoadClas
>>> s\\LongPath.d\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>> xxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>> length 259
>>> name Hello.class length 11
>>>
>>> 2) The directory path exceeds MAX_PATH:
>>>
>>> Java HotSpot(TM) 64-Bit Server VM warning: os::stat failed:
>>> path length 265 exceeds max length 260
>>> path
>>> T:\\testoutput\\jtreg\\JTwork\\classes\\2\\runtime\\LoadClas
>>> s\\LongPath.d\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>> xxxxxxxxxxxxxxxxxxxxxxxxxxxx\\xxxxx
>>>
>>> webrev:
>>> http://cr.openjdk.java.net/~ccheung/8188122/webrev.00/
>>> <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.00/>
>>>
>>> Testing:
>>> JPRT (including the new test)
>>>
>>> thanks,
>>> Calvin
>>>
>>>
>>>
>
More information about the hotspot-runtime-dev
mailing list