RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters

Fri Feb 28 05:43:28 UTC 2020

Hi Yasumasa,

thanks for fixing this. CC'ing Ralf since this was his change.

-----

Not sure MB_ERR_INVALID_CHARS with a subsequent return NULL is a good
option for conversion errors? Would it not better to use the default
behavior of replacing invalid code points with U+FFFD?

-----

Can the result buffer overflow? UTF16 is a variable sized char set, code
points can overflow into the upper plane and use 32bit. Result buffer size
is calculated in terms of sizeof(wchar_t). Not sure what size wchar_t is;
if it is 16bit the result buffer could be too small. If it is 32bit I'd be
really confused too :) since we would have had 32bit-wide characters before
and now we deal with UTF16?

Should it be too small, there are two solutions:

1) either over-allocate for the worst case: for each input char allocate
space for two 16bit code points.
2) or, call MultiByteToWideChar twice, first time with 0 as output buffer
size. Then, function just returns output buffer size, which you can
allocate and repeat the call.

I would at least like to see an explicit assert as toward (sizeof(uint_16)
* 2 * num output chars) < result buffer size.

-----

Come to think of it, the intermixing of wchar_t* with LPWSTR is a bit
confusing. It feeds a wchar_t* as result buffer to MultiByteToWideChar. Can
you make path_start a LPWSTR and cast it explicitely?

----

I assume you call ZeroMemory to make the result string zero terminated? A
cheaper and simpler way would be to either pass -1 as input buffer length
or to fix up the terminating zero after the call. I'd probably do the
former.

----

Not a must do, but regression tests would be nice. Easiest thing I could
think of would be testing wide_abs_unc_path() in a gtest with some CJK
literals. Now that I think of it, regression tests may be good for this
function in general.

For that wide_abs_unc_path would have to be exported (the static removed)
and declared as a prototype in the associated gtest, if we do not want to
pollute a header with it.

-----

Cheers, Thomas

On Fri, Feb 28, 2020 at 2:55 AM Yasumasa Suenaga <suenaga at oss.nttdata.com>
wrote:

> Hi all,
>
> Please review this change:
>
>    JBS: https://bugs.openjdk.java.net/browse/JDK-8240197
>    webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.00/
>
> JVM cannot start as below when CJK characters are included in $JAVA_HOME.
>
> ```
> PS > .\build\windows-x86_64-server-fastdebug\images\jdk日本語\bin\java.exe
> --version
> Error occurred during initialization of VM
> Failed setting boot class path.
> ```
>
> I think this bug has been occurred since JDK-8191521.
> It uses mbstowcs_s() to convert char* to wchar_t*. But Windows API for
> wchar_t* (they are named *W()) requires Unicode chars for their arguments.
> mbstowcs_s() would not convert to Unicode. We need to use
> MultiByteToWideChar().
>
> This issue has been also confirmed in AdoptOpenJDK 11.0.6 [1]. So we also
> need to change jdk11u.
>
> I tested this change on submit repo, then it was failed
> (mach5-one-ysuenaga-JDK-8240197-20200228-0138-9049982).
> However the error was not caused by this change because it was occurred in
> all platforms in spite of I changed for Windows only.
> (Of course, it works well on my Windows 10 laptop.)
>
> I'm happy if you help to investigate cause of error.
>
>
> Thanks,
>
> Yasumasa
>
>
> [1] https://github.com/AdoptOpenJDK/openjdk-build/issues/1496
>