RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters

Thu Mar 5 05:43:29 UTC 2020

Hi Yasumasa,

Had a glance over version 05. Some remarks:

--

I dislike the reinterpret_cast<> and do not understand the need for it. We
do not do this in other places with malloc. Strictly speaking this would be
UB since it is not guaranteed for A=reinterpret_cast<B> that A==B.

In any way you should use NEW_C_HEAP_ARRAY and FREE_C_HEAP_ARRAY instead
since those do the casting for you.

They also exit the VM in case of OOM so no need to handle that manually.

--

In wide_abs_unc_path:

+  // Need to allocate at least room for 3 characters, since
os::native_path transforms C: to C:.
+  size_t buf_size = 1 + MAX2((size_t)3, strlen(path));
+  char* buf = reinterpret_cast<char*>(os::malloc(buf_size, mtInternal));
+  if (buf == NULL) {
+    vm_exit_out_of_memory(buf_size, OOM_MALLOC_ERROR, "wide_abs_unc_path");
+  }
+  memcpy(buf, path, buf_size);
+  os::native_path(buf);

The memcpy is wrong. You have a rare chance of triggering a read fault
since you read over the limit of path. Just use strcpy.

--

The comment about "additional_space" confused me until I read the sources,
sp I would suggest reforming it as such:

-// additional_space is the number of additionally allocated wchars after
the terminating L'\0'.
+// additional_space is the size of space, in wchar_t, the function will
additionally add to the allocation of return buffer (such that the size of
the returned buffer is at least wcslen(buf) + 1 + additional_space).

--

I would also omit this comment:

+// This is based on pathToNTPath() in io_util_md.cpp, but omits the
optimizations for
+// short paths.

I do not think anyone reading this code cares.

--

Hmm, small style nit, sometimes we return errno_t and the actual return
value is via output parameter,

+static errno_t convert_to_unicode(char const* char_path, LPWSTR&
unicode_path) {

+static errno_t get_full_path(LPCWSTR unicode_path, LPWSTR& full_path) {

and here its the other way around:

+static wchar_t* wide_abs_unc_path(char const* path, errno_t & err, int
additional_space = 0) {

I do not care much, but it seems strange.

--

+static errno_t convert_to_unicode(char const* char_path, LPWSTR&
unicode_path)

+  LPWSTR unicode_path;
+  err = convert_to_unicode(buf, unicode_path);
+  os::free(buf);

Sigh - I really do dislike passing output parameters via reference, since
at the call site is is indistinguishable from an input parameter.
I know C++ programmers love this though so I won't fight this here...

Could you please at least initialize unicode_path, since you do this in
other places with output parameters too?

--

The rest seems fine.

Cheers, Thomas

On Thu, Mar 5, 2020 at 1:18 AM Yasumasa Suenaga <suenaga at oss.nttdata.com>
wrote:

> Hi Ioi,
>
> How about this change?
>
>    http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.05/
>
> I already tried to use ResouceMark, but it was crashed because
> wide_abs_unc_path() would be called from outside of java thread at boot
> time.
> This webrev uses os::free() instead of memory cleaner class, but I think
> it is simple because I tweaked memory lifecycle.
>
>
> Thanks,
>
> Yasumasa
>
>
> On 2020/03/05 6:56, Ioi Lam wrote:
> > Hi Yasumasa,
> >
> > This version looks good!
> >
> >
> > [0] One thing I forgot ... is it possible to use "ResourceMark rm" in
> this function? If so, you can allocate the temp arrays using
> NEW_RESOURCE_ARRAY and then don't need to worry about freeing them.
> >
> >
> > [1] for this code
> >
> > 4202   // This call would be success because it is checked in above
> > 4203   GetFullPathNameW(unicode_path, full_path_len, full_path, NULL);
> >
> > How about this instead:
> >
> >        DWORD len = GetFullPathNameW(unicode_path, full_path_len,
> full_path, NULL);
> >        assert(len <= full_path_len, "length already checked above");
> >
> > (same for the MultiByteToWideChar call in convert_to_unicode)
> >
> >
> > [2] This is no longer needed
> >
> > 4240     os::free(buf);
> >
> >
> > [3] I think the following code is not needed ....
> >
> > 4269   } else {
> > 4270     size_t converted_path_size = sizeof(WCHAR) *
> (wcslen(unicode_path) + 1);
> > 4271     converted_path =
> reinterpret_cast<LPWSTR>(os::malloc(converted_path_size, mtInternal));
> > 4272     if (converted_path == NULL) {
> > 4273       vm_exit_out_of_memory(buf_size, OOM_MALLOC_ERROR,
> "wide_abs_unc_path");
> > 4274     }
> > 4275     memcpy(converted_path, unicode_path, converted_path_size);
> >
> >
> > ... if you free the original buffer inside get_full_path (or use
> ResourceMark):
> >
> > static errno_t get_full_path(LPCWSTR& unicode_path) {
> >     LPCWSTR full_path = .....;
> >     ....
> >     free(unicode_path);
> > unicode_path = full_path;
> >     return ERROR_SUCCESS;
> > }
> >
> > [4] This looks good! Much more readable than the original code!
> >
> > 4285   _snwprintf(result, result_len, L"%s%s", prefix,
> &converted_path[prefix_off]);
> >
> >
> > Thanks
> > - Ioi
> >
> >
> > On 3/4/20 12:22 AM, Yasumasa Suenaga wrote:
> >> Hi Ioi,
> >>
> >> Thanks for your comment.
> >> How about this change?
> >>
> >>   http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.04/
> >>
> >> This webrev adds the function both converting to unicode and getting
> full path,
> >> they would be called from wide_abs_unc_path().
> >> Also MemoryCleaner class which is introduced in this webrev frees
> memory auomatically in d'tor.
> >>
> >>
> >> Thanks,
> >>
> >> Yasumasa
> >>
> >>
> >> On 2020/03/04 16:05, Ioi Lam wrote:
> >>>
> >>>
> >>> On 3/3/20 10:22 PM, Yasumasa Suenaga wrote:
> >>>> Hi Ioi,
> >>>>
> >>>>> [1] I think the memory freeing can be done using a destructor, so
> you can simply return in case of error, without having a lot of "if" checks:
> >>>>
> >>>> What do you think to use `goto` statement like Linux kernel?
> >>>>
> >>>>
> >>> I don't have a strong opinion but I think many/most HotSpot developers
> are against it. Also with goto, you need to store the return value in a
> variable, but with destructors, you can just say "return value", so the
> code is cleaner.
> >>>
> >>> Thanks
> >>> - Ioi
> >>>
> >>>> Yasumasa
> >>>>
> >>>>
> >>>> On 2020/03/04 14:50, Ioi Lam wrote:
> >>>>> Hi Yasumasa,
> >>>>>
> >>>>> I think this function is getting too long and it's hard to figure
> out all the error conditions.
> >>>>>
> >>>>> (I feel sorry for you because the code wasn't pretty messy to begin
> with, but I think it's at a point that we need to do the cleanup. It took
> me like 30 minutes to understand what is going on)
> >>>>>
> >>>>> [0] I noticed you changed the return type to LPWSTR, but all the
> callers are using wchar_t*. Are these two types the same?
> >>>>>
> >>>>>      If you want to use LPWSTR within this function for consistency,
> maybe you can return the value as
> >>>>>
> >>>>>      return (wchar_t*)result; // LPWSTR and wchat_t* are the same
> type on Windows.
> >>>>>
> >>>>>      That way people unfamiliar with Windows (most of us) won't be
> confused.
> >>>>>
> >>>>> [1] I think the memory freeing can be done using a destructor, so
> you can simply return in case of error, without having a lot of "if" checks:
> >>>>>
> >>>>>      class State {
> >>>>>         errno_t & err;
> >>>>>          char* buf;
> >>>>>          LPWSTR unicode_path;
> >>>>> LPWSTR result;
> >>>>>
> >>>>>          State(errno_t & e) : buf(NULL), unicode_path(NULL),
> result(NULL), err(e) {
> >>>>>              err = SUCCESS;
> >>>>>           }
> >>>>>          ~State() {
> >>>>>            os::free(buf);
> >>>>>            os::free(unicode_path);
> >>>>>            if (err != ERROR_SUCCESS) {
> >>>>>               os::free(result);
> >>>>>            }
> >>>>>          }
> >>>>>          LPWSTR as_error(errno_t e) {
> >>>>>              err = e;
> >>>>>              return NULL;
> >>>>>          }
> >>>>>      } st;
> >>>>>
> >>>>>
> >>>>>      if ((st.buf = static_cast<char*>(os::malloc(buf_size,
> mtInternal))) == NULL) {
> >>>>>          return st.as_error(ENOMEM);
> >>>>>      }
> >>>>>
> >>>>>      ....
> >>>>>      return st.result;
> >>>>>
> >>>>> [2] Will it be possible to break the functon down into smaller
> pieces, so it's easier to understand what's going on?
> >>>>>
> >>>>> E.g., Maybe the code that  converts the C string into a unicode
> string can be moved into a separate function?
> >>>>>
> >>>>>    // Get required buffer size to convert to Unicode
> >>>>>    int unicode_path_len = MultiByteToWideChar(CP_THREAD_ACP,
> >>>>> MB_ERR_INVALID_CHARS,
> >>>>>                                               buf, -1,
> >>>>>                                               NULL, 0);
> >>>>>    if (unicode_path_len == 0) {
> >>>>>      os::free(buf);
> >>>>>      err = EINVAL;
> >>>>>      return NULL;
> >>>>>    }
> >>>>>
> >>>>>    ....
> >>>>>
> >>>>>    LPWSTR unicode_path =
> static_cast<LPWSTR>(os::malloc(sizeof(WCHAR) * unicode_path_len,
> mtInternal));
> >>>>>    if (unicode_path == NULL) {
> >>>>>      err = ENOMEM;
> >>>>>    } else {
> >>>>>      // This call would be success because it is checked in above
> >>>>>      err = ERROR_SUCCESS;
> >>>>>      MultiByteToWideChar(CP_THREAD_ACP,
> >>>>>                          MB_ERR_INVALID_CHARS,
> >>>>>                          buf, -1,
> >>>>>                          unicode_path, unicode_path_len);
> >>>>>
> >>>>>
> >>>>>
> >>>>> [3] size_t result_len = prefix_len - prefix_off + additional_space;
> >>>>>
> >>>>> The variable name result_len is confusing. It's not the length of
> the result.
> >>>>>
> >>>>> [4] The old copying code was already hard to understand, and the new
> version now has 2 versions of the copying code. I think the reason is you
> want to avoid an extra memcpy by calling GetFullPathNameW directly on the
> result.
> >>>>>
> >>>>>      if (needs_fullpath) {
> >>>>>        // Get required buffer size to convert to full path
> >>>>>        DWORD full_path_len = GetFullPathNameW(unicode_path, 0, NULL,
> NULL);
> >>>>>        if (full_path_len == 0) {
> >>>>>          err = EINVAL;
> >>>>>        } else {
> >>>>>          size_t result_size = sizeof(WCHAR) * (result_len +
> full_path_len);
> >>>>>          result = static_cast<LPWSTR>(os::malloc(result_size,
> mtInternal));
> >>>>>          if (result == NULL) {
> >>>>>            err = ENOMEM;
> >>>>>          } else {
> >>>>>            // This call would be success because it is checked in
> above
> >>>>>            GetFullPathNameW(unicode_path, full_path_len, result +
> prefix_len - prefix_off, NULL);
> >>>>>            // Copy prefix
> >>>>>            memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
> >>>>>          }
> >>>>>        }
> >>>>>      } else {
> >>>>>        size_t result_size = sizeof(WCHAR) * (result_len +
> unicode_path_len);
> >>>>>        result = static_cast<LPWSTR>(os::malloc(result_size,
> mtInternal));
> >>>>>        if (result == NULL) {
> >>>>>          err = ENOMEM;
> >>>>>        } else {
> >>>>>          // Copy unicode path
> >>>>>          memcpy(result + prefix_len - prefix_off, unicode_path,
> sizeof(WCHAR) * unicode_path_len);
> >>>>>          // Copy prefix
> >>>>>          memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
> >>>>>        }
> >>>>>      }
> >>>>>
> >>>>> I think the code can be simplified if you do the fullpath version in
> a separate function:
> >>>>>
> >>>>>      if (needs_fullpath) {
> >>>>>         unicode_path = to_full_path(unicode_path, err,
> &unicode_path_len);
> >>>>>      }
> >>>>>
> >>>>>      size_t result_size = ......
> >>>>>
> >>>>> This will do one more memcpy, but I think it's not performance
> critical
> >>>>>
> >>>>> Thanks
> >>>>> - Ioi
> >>>>>
> >>>>>
> >>>>> On 3/3/20 4:19 PM, Yasumasa Suenaga wrote:
> >>>>>> Thanks Ralf!
> >>>>>> I fixed them.
> >>>>>>
> >>>>>> I need a Reviewer to push, so I uploaded new webrev:
> >>>>>>
> >>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.03/
> >>>>>>
> >>>>>>
> >>>>>> Yasumasa
> >>>>>>
> >>>>>>
> >>>>>> On 2020/03/03 23:25, Schmelter, Ralf wrote:
> >>>>>>> Hi Yasumasa,
> >>>>>>>
> >>>>>>> thanks for your work. The code should now be able to handle any
> path.
> >>>>>>>
> >>>>>>> Two small changes (no new webrev needed):
> >>>>>>>
> >>>>>>> size_t result_len = prefix_len + prefix_off + additional_space
> >>>>>>> should be
> >>>>>>> size_t result_len = prefix_len - prefix_off + additional_space
> >>>>>>>
> >>>>>>> And you should copy the prefix after you copied the filename,
> since sometimes the prefix is required to overwrite the start of the
> filename:
> >>>>>>>
> >>>>>>> // This call would be success because it is checked in above
> >>>>>>> GetFullPathNameW(unicode_path, full_path_len, result + prefix_len
> - prefix_off, NULL);
> >>>>>>> // Copy prefix
> >>>>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
> >>>>>>>
> >>>>>>> and
> >>>>>>>
> >>>>>>> // Copy unicode path
> >>>>>>> memcpy(result + prefix_len - prefix_off, unicode_path,
> sizeof(WCHAR) * unicode_path_len);
> >>>>>>> // Copy prefix
> >>>>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
> >>>>>>>
> >>>>>>> With these changes the os_windows gtest and the appcds jtreg tests
> run without errors.
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Ralf
> >>>>>>>
> >>>>>
> >>>
> >
>