RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters
Yasumasa Suenaga
suenaga at oss.nttdata.com
Fri Feb 28 07:39:08 UTC 2020
Hi Thomas,
Thanks for your review!
I refactored entire of this function as result :)
Please review again this webrev:
http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.01/
On 2020/02/28 14:43, Thomas Stüfe wrote:
> Hi Yasumasa,
>
> thanks for fixing this. CC'ing Ralf since this was his change.
>
> -----
>
> Not sure MB_ERR_INVALID_CHARS with a subsequent return NULL is a good option for conversion errors? Would it not better to use the default behavior of replacing invalid code points with U+FFFD?
I think we should use MB_ERR_INVALID_CHARS at this point because invalid chars should not be used in the path.
If invalid char(s) are converted to special char (e.g. U+FFFD), it is not mapped to original path.
> -----
>
> Can the result buffer overflow? UTF16 is a variable sized char set, code points can overflow into the upper plane and use 32bit. Result buffer size is calculated in terms of sizeof(wchar_t). Not sure what size wchar_t is; if it is 16bit the result buffer could be too small. If it is 32bit I'd be really confused too :) since we would have had 32bit-wide characters before and now we deal with UTF16?
>
> Should it be too small, there are two solutions:
>
> 1) either over-allocate for the worst case: for each input char allocate space for two 16bit code points.
> 2) or, call MultiByteToWideChar twice, first time with 0 as output buffer size. Then, function just returns output buffer size, which you can allocate and repeat the call.
>
> I would at least like to see an explicit assert as toward (sizeof(uint_16) * 2 * num output chars) < result buffer size.
We can get required buffer size when we pass zero to cchWideChar.
MultiByteToWideChar (Microsoft Docs)
https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar
New webrev passed zero to cchWideChar at first, and it would allocate buffer with its value.
So buffer overflow would not occur.
> -----
>
> Come to think of it, the intermixing of wchar_t* with LPWSTR is a bit confusing. It feeds a wchar_t* as result buffer to MultiByteToWideChar. Can you make path_start a LPWSTR and cast it explicitely?
I changed wchar_t to WCHAR, and wchar_t* to LPWSTR.
> ----
>
> I assume you call ZeroMemory to make the result string zero terminated? A cheaper and simpler way would be to either pass -1 as input buffer length or to fix up the terminating zero after the call. I'd probably do the former.
If we pass -1 to cbMultiByte, MultiByteToWideChar() processes the entire input string, including the terminating null character (See Microsoft Doc).
So I use it in new webrev.
> ----
>
> Not a must do, but regression tests would be nice. Easiest thing I could think of would be testing wide_abs_unc_path() in a gtest with some CJK literals. Now that I think of it, regression tests may be good for this function in general.
>
> For that wide_abs_unc_path would have to be exported (the static removed) and declared as a prototype in the associated gtest, if we do not want to pollute a header with it.
It is nice idea, but MultiByteToWideChar() would perform with available locale. If we check wide_abs_unc_path() with Japanese characters, Japanese language need to be installed (See Microsoft Doc).
So it is difficult to test on all platform.
Thanks,
Yasumasa
> -----
>
> Cheers, Thomas
>
>
> On Fri, Feb 28, 2020 at 2:55 AM Yasumasa Suenaga <suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>> wrote:
>
> Hi all,
>
> Please review this change:
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8240197
> webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.00/
>
> JVM cannot start as below when CJK characters are included in $JAVA_HOME.
>
> ```
> PS > .\build\windows-x86_64-server-fastdebug\images\jdk日本語\bin\java.exe --version
> Error occurred during initialization of VM
> Failed setting boot class path.
> ```
>
> I think this bug has been occurred since JDK-8191521.
> It uses mbstowcs_s() to convert char* to wchar_t*. But Windows API for wchar_t* (they are named *W()) requires Unicode chars for their arguments. mbstowcs_s() would not convert to Unicode. We need to use MultiByteToWideChar().
>
> This issue has been also confirmed in AdoptOpenJDK 11.0.6 [1]. So we also need to change jdk11u.
>
> I tested this change on submit repo, then it was failed (mach5-one-ysuenaga-JDK-8240197-20200228-0138-9049982).
> However the error was not caused by this change because it was occurred in all platforms in spite of I changed for Windows only.
> (Of course, it works well on my Windows 10 laptop.)
>
> I'm happy if you help to investigate cause of error.
>
>
> Thanks,
>
> Yasumasa
>
>
> [1] https://github.com/AdoptOpenJDK/openjdk-build/issues/1496
>
More information about the hotspot-runtime-dev
mailing list