RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters

Thu Mar 5 06:41:06 UTC 2020

Thanks Thomas!

I will push it if Ralf is ok to this change (webrev.06).

Yasumasa

On 2020/03/05 15:38, Thomas Stüfe wrote:
> Looks good to me now, thanks for the changes and your perseverance!
> 
> All good from my side.
> 
> ..Thomas
> 
> On Thu, Mar 5, 2020 at 7:28 AM Yasumasa Suenaga <suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>> wrote:
> 
>     Hi Thomas,
> 
>     Thanks for your comment.
>     I fixed them in new webrev:
> 
>     http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.06/
> 
> 
>     Yasumasa
> 
> 
>     On 2020/03/05 14:43, Thomas Stüfe wrote:
>      > Hi Yasumasa,
>      >
>      > Had a glance over version 05. Some remarks:
>      >
>      > --
>      >
>      > I dislike the reinterpret_cast<> and do not understand the need for it. We do not do this in other places with malloc. Strictly speaking this would be UB since it is not guaranteed for A=reinterpret_cast<B> that A==B.
>      >
>      > In any way you should use NEW_C_HEAP_ARRAY and FREE_C_HEAP_ARRAY instead since those do the casting for you.
>      >
>      > They also exit the VM in case of OOM so no need to handle that manually.
>      >
>      > --
>      >
>      > In wide_abs_unc_path:
>      >
>      > +  // Need to allocate at least room for 3 characters, since os::native_path transforms C: to C:.
>      > +  size_t buf_size = 1 + MAX2((size_t)3, strlen(path));
>      > +  char* buf = reinterpret_cast<char*>(os::malloc(buf_size, mtInternal));
>      > +  if (buf == NULL) {
>      > +    vm_exit_out_of_memory(buf_size, OOM_MALLOC_ERROR, "wide_abs_unc_path");
>      > +  }
>      > +  memcpy(buf, path, buf_size);
>      > +  os::native_path(buf);
>      >
>      > The memcpy is wrong. You have a rare chance of triggering a read fault since you read over the limit of path. Just use strcpy.
>      >
>      > --
>      >
>      > The comment about "additional_space" confused me until I read the sources, sp I would suggest reforming it as such:
>      >
>      > -// additional_space is the number of additionally allocated wchars after the terminating L'\0'.
>      > +// additional_space is the size of space, in wchar_t, the function will additionally add to the allocation of return buffer (such that the size of the returned buffer is at least wcslen(buf) + 1 + additional_space).
>      >
>      > --
>      >
>      > I would also omit this comment:
>      >
>      > +// This is based on pathToNTPath() in io_util_md.cpp, but omits the optimizations for
>      > +// short paths.
>      >
>      > I do not think anyone reading this code cares.
>      >
>      > --
>      >
>      > Hmm, small style nit, sometimes we return errno_t and the actual return value is via output parameter,
>      >
>      > +static errno_t convert_to_unicode(char const* char_path, LPWSTR& unicode_path) {
>      >
>      > +static errno_t get_full_path(LPCWSTR unicode_path, LPWSTR& full_path) {
>      >
>      > and here its the other way around:
>      >
>      > +static wchar_t* wide_abs_unc_path(char const* path, errno_t & err, int additional_space = 0) {
>      >
>      > I do not care much, but it seems strange.
>      >
>      > --
>      >
>      > +static errno_t convert_to_unicode(char const* char_path, LPWSTR& unicode_path)
>      >
>      > +  LPWSTR unicode_path;
>      > +  err = convert_to_unicode(buf, unicode_path);
>      > +  os::free(buf);
>      >
>      > Sigh - I really do dislike passing output parameters via reference, since at the call site is is indistinguishable from an input parameter.
>      > I know C++ programmers love this though so I won't fight this here...
>      >
>      > Could you please at least initialize unicode_path, since you do this in other places with output parameters too?
>      >
>      > --
>      >
>      > The rest seems fine.
>      >
>      > Cheers, Thomas
>      >
>      > On Thu, Mar 5, 2020 at 1:18 AM Yasumasa Suenaga <suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com> <mailto:suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>>> wrote:
>      >
>      >     Hi Ioi,
>      >
>      >     How about this change?
>      >
>      > http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.05/
>      >
>      >     I already tried to use ResouceMark, but it was crashed because wide_abs_unc_path() would be called from outside of java thread at boot time.
>      >     This webrev uses os::free() instead of memory cleaner class, but I think it is simple because I tweaked memory lifecycle.
>      >
>      >
>      >     Thanks,
>      >
>      >     Yasumasa
>      >
>      >
>      >     On 2020/03/05 6:56, Ioi Lam wrote:
>      >      > Hi Yasumasa,
>      >      >
>      >      > This version looks good!
>      >      >
>      >      >
>      >      > [0] One thing I forgot ... is it possible to use "ResourceMark rm" in this function? If so, you can allocate the temp arrays using NEW_RESOURCE_ARRAY and then don't need to worry about freeing them.
>      >      >
>      >      >
>      >      > [1] for this code
>      >      >
>      >      > 4202   // This call would be success because it is checked in above
>      >      > 4203   GetFullPathNameW(unicode_path, full_path_len, full_path, NULL);
>      >      >
>      >      > How about this instead:
>      >      >
>      >      >        DWORD len = GetFullPathNameW(unicode_path, full_path_len, full_path, NULL);
>      >      >        assert(len <= full_path_len, "length already checked above");
>      >      >
>      >      > (same for the MultiByteToWideChar call in convert_to_unicode)
>      >      >
>      >      >
>      >      > [2] This is no longer needed
>      >      >
>      >      > 4240     os::free(buf);
>      >      >
>      >      >
>      >      > [3] I think the following code is not needed ....
>      >      >
>      >      > 4269   } else {
>      >      > 4270     size_t converted_path_size = sizeof(WCHAR) * (wcslen(unicode_path) + 1);
>      >      > 4271     converted_path = reinterpret_cast<LPWSTR>(os::malloc(converted_path_size, mtInternal));
>      >      > 4272     if (converted_path == NULL) {
>      >      > 4273       vm_exit_out_of_memory(buf_size, OOM_MALLOC_ERROR, "wide_abs_unc_path");
>      >      > 4274     }
>      >      > 4275     memcpy(converted_path, unicode_path, converted_path_size);
>      >      >
>      >      >
>      >      > ... if you free the original buffer inside get_full_path (or use ResourceMark):
>      >      >
>      >      > static errno_t get_full_path(LPCWSTR& unicode_path) {
>      >      >     LPCWSTR full_path = .....;
>      >      >     ....
>      >      >     free(unicode_path);
>      >      > unicode_path = full_path;
>      >      >     return ERROR_SUCCESS;
>      >      > }
>      >      >
>      >      > [4] This looks good! Much more readable than the original code!
>      >      >
>      >      > 4285   _snwprintf(result, result_len, L"%s%s", prefix, &converted_path[prefix_off]);
>      >      >
>      >      >
>      >      > Thanks
>      >      > - Ioi
>      >      >
>      >      >
>      >      > On 3/4/20 12:22 AM, Yasumasa Suenaga wrote:
>      >      >> Hi Ioi,
>      >      >>
>      >      >> Thanks for your comment.
>      >      >> How about this change?
>      >      >>
>      >      >> http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.04/
>      >      >>
>      >      >> This webrev adds the function both converting to unicode and getting full path,
>      >      >> they would be called from wide_abs_unc_path().
>      >      >> Also MemoryCleaner class which is introduced in this webrev frees memory auomatically in d'tor.
>      >      >>
>      >      >>
>      >      >> Thanks,
>      >      >>
>      >      >> Yasumasa
>      >      >>
>      >      >>
>      >      >> On 2020/03/04 16:05, Ioi Lam wrote:
>      >      >>>
>      >      >>>
>      >      >>> On 3/3/20 10:22 PM, Yasumasa Suenaga wrote:
>      >      >>>> Hi Ioi,
>      >      >>>>
>      >      >>>>> [1] I think the memory freeing can be done using a destructor, so you can simply return in case of error, without having a lot of "if" checks:
>      >      >>>>
>      >      >>>> What do you think to use `goto` statement like Linux kernel?
>      >      >>>>
>      >      >>>>
>      >      >>> I don't have a strong opinion but I think many/most HotSpot developers are against it. Also with goto, you need to store the return value in a variable, but with destructors, you can just say "return value", so the code is cleaner.
>      >      >>>
>      >      >>> Thanks
>      >      >>> - Ioi
>      >      >>>
>      >      >>>> Yasumasa
>      >      >>>>
>      >      >>>>
>      >      >>>> On 2020/03/04 14:50, Ioi Lam wrote:
>      >      >>>>> Hi Yasumasa,
>      >      >>>>>
>      >      >>>>> I think this function is getting too long and it's hard to figure out all the error conditions.
>      >      >>>>>
>      >      >>>>> (I feel sorry for you because the code wasn't pretty messy to begin with, but I think it's at a point that we need to do the cleanup. It took me like 30 minutes to understand what is going on)
>      >      >>>>>
>      >      >>>>> [0] I noticed you changed the return type to LPWSTR, but all the callers are using wchar_t*. Are these two types the same?
>      >      >>>>>
>      >      >>>>>      If you want to use LPWSTR within this function for consistency, maybe you can return the value as
>      >      >>>>>
>      >      >>>>>      return (wchar_t*)result; // LPWSTR and wchat_t* are the same type on Windows.
>      >      >>>>>
>      >      >>>>>      That way people unfamiliar with Windows (most of us) won't be confused.
>      >      >>>>>
>      >      >>>>> [1] I think the memory freeing can be done using a destructor, so you can simply return in case of error, without having a lot of "if" checks:
>      >      >>>>>
>      >      >>>>>      class State {
>      >      >>>>>         errno_t & err;
>      >      >>>>>          char* buf;
>      >      >>>>>          LPWSTR unicode_path;
>      >      >>>>> LPWSTR result;
>      >      >>>>>
>      >      >>>>>          State(errno_t & e) : buf(NULL), unicode_path(NULL), result(NULL), err(e) {
>      >      >>>>>              err = SUCCESS;
>      >      >>>>>           }
>      >      >>>>>          ~State() {
>      >      >>>>>            os::free(buf);
>      >      >>>>>            os::free(unicode_path);
>      >      >>>>>            if (err != ERROR_SUCCESS) {
>      >      >>>>>               os::free(result);
>      >      >>>>>            }
>      >      >>>>>          }
>      >      >>>>>          LPWSTR as_error(errno_t e) {
>      >      >>>>>              err = e;
>      >      >>>>>              return NULL;
>      >      >>>>>          }
>      >      >>>>>      } st;
>      >      >>>>>
>      >      >>>>>
>      >      >>>>>      if ((st.buf = static_cast<char*>(os::malloc(buf_size, mtInternal))) == NULL) {
>      >      >>>>>          return st.as_error(ENOMEM);
>      >      >>>>>      }
>      >      >>>>>
>      >      >>>>>      ....
>      >      >>>>>      return st.result;
>      >      >>>>>
>      >      >>>>> [2] Will it be possible to break the functon down into smaller pieces, so it's easier to understand what's going on?
>      >      >>>>>
>      >      >>>>> E.g., Maybe the code that  converts the C string into a unicode string can be moved into a separate function?
>      >      >>>>>
>      >      >>>>>    // Get required buffer size to convert to Unicode
>      >      >>>>>    int unicode_path_len = MultiByteToWideChar(CP_THREAD_ACP,
>      >      >>>>> MB_ERR_INVALID_CHARS,
>      >      >>>>>                                               buf, -1,
>      >      >>>>>                                               NULL, 0);
>      >      >>>>>    if (unicode_path_len == 0) {
>      >      >>>>>      os::free(buf);
>      >      >>>>>      err = EINVAL;
>      >      >>>>>      return NULL;
>      >      >>>>>    }
>      >      >>>>>
>      >      >>>>>    ....
>      >      >>>>>
>      >      >>>>>    LPWSTR unicode_path = static_cast<LPWSTR>(os::malloc(sizeof(WCHAR) * unicode_path_len, mtInternal));
>      >      >>>>>    if (unicode_path == NULL) {
>      >      >>>>>      err = ENOMEM;
>      >      >>>>>    } else {
>      >      >>>>>      // This call would be success because it is checked in above
>      >      >>>>>      err = ERROR_SUCCESS;
>      >      >>>>>      MultiByteToWideChar(CP_THREAD_ACP,
>      >      >>>>>                          MB_ERR_INVALID_CHARS,
>      >      >>>>>                          buf, -1,
>      >      >>>>>                          unicode_path, unicode_path_len);
>      >      >>>>>
>      >      >>>>>
>      >      >>>>>
>      >      >>>>> [3] size_t result_len = prefix_len - prefix_off + additional_space;
>      >      >>>>>
>      >      >>>>> The variable name result_len is confusing. It's not the length of the result.
>      >      >>>>>
>      >      >>>>> [4] The old copying code was already hard to understand, and the new version now has 2 versions of the copying code. I think the reason is you want to avoid an extra memcpy by calling GetFullPathNameW directly on the result.
>      >      >>>>>
>      >      >>>>>      if (needs_fullpath) {
>      >      >>>>>        // Get required buffer size to convert to full path
>      >      >>>>>        DWORD full_path_len = GetFullPathNameW(unicode_path, 0, NULL, NULL);
>      >      >>>>>        if (full_path_len == 0) {
>      >      >>>>>          err = EINVAL;
>      >      >>>>>        } else {
>      >      >>>>>          size_t result_size = sizeof(WCHAR) * (result_len + full_path_len);
>      >      >>>>>          result = static_cast<LPWSTR>(os::malloc(result_size, mtInternal));
>      >      >>>>>          if (result == NULL) {
>      >      >>>>>            err = ENOMEM;
>      >      >>>>>          } else {
>      >      >>>>>            // This call would be success because it is checked in above
>      >      >>>>>            GetFullPathNameW(unicode_path, full_path_len, result + prefix_len - prefix_off, NULL);
>      >      >>>>>            // Copy prefix
>      >      >>>>>            memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>      >      >>>>>          }
>      >      >>>>>        }
>      >      >>>>>      } else {
>      >      >>>>>        size_t result_size = sizeof(WCHAR) * (result_len + unicode_path_len);
>      >      >>>>>        result = static_cast<LPWSTR>(os::malloc(result_size, mtInternal));
>      >      >>>>>        if (result == NULL) {
>      >      >>>>>          err = ENOMEM;
>      >      >>>>>        } else {
>      >      >>>>>          // Copy unicode path
>      >      >>>>>          memcpy(result + prefix_len - prefix_off, unicode_path, sizeof(WCHAR) * unicode_path_len);
>      >      >>>>>          // Copy prefix
>      >      >>>>>          memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>      >      >>>>>        }
>      >      >>>>>      }
>      >      >>>>>
>      >      >>>>> I think the code can be simplified if you do the fullpath version in a separate function:
>      >      >>>>>
>      >      >>>>>      if (needs_fullpath) {
>      >      >>>>>         unicode_path = to_full_path(unicode_path, err, &unicode_path_len);
>      >      >>>>>      }
>      >      >>>>>
>      >      >>>>>      size_t result_size = ......
>      >      >>>>>
>      >      >>>>> This will do one more memcpy, but I think it's not performance critical
>      >      >>>>>
>      >      >>>>> Thanks
>      >      >>>>> - Ioi
>      >      >>>>>
>      >      >>>>>
>      >      >>>>> On 3/3/20 4:19 PM, Yasumasa Suenaga wrote:
>      >      >>>>>> Thanks Ralf!
>      >      >>>>>> I fixed them.
>      >      >>>>>>
>      >      >>>>>> I need a Reviewer to push, so I uploaded new webrev:
>      >      >>>>>>
>      >      >>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.03/
>      >      >>>>>>
>      >      >>>>>>
>      >      >>>>>> Yasumasa
>      >      >>>>>>
>      >      >>>>>>
>      >      >>>>>> On 2020/03/03 23:25, Schmelter, Ralf wrote:
>      >      >>>>>>> Hi Yasumasa,
>      >      >>>>>>>
>      >      >>>>>>> thanks for your work. The code should now be able to handle any path.
>      >      >>>>>>>
>      >      >>>>>>> Two small changes (no new webrev needed):
>      >      >>>>>>>
>      >      >>>>>>> size_t result_len = prefix_len + prefix_off + additional_space
>      >      >>>>>>> should be
>      >      >>>>>>> size_t result_len = prefix_len - prefix_off + additional_space
>      >      >>>>>>>
>      >      >>>>>>> And you should copy the prefix after you copied the filename, since sometimes the prefix is required to overwrite the start of the filename:
>      >      >>>>>>>
>      >      >>>>>>> // This call would be success because it is checked in above
>      >      >>>>>>> GetFullPathNameW(unicode_path, full_path_len, result + prefix_len - prefix_off, NULL);
>      >      >>>>>>> // Copy prefix
>      >      >>>>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>      >      >>>>>>>
>      >      >>>>>>> and
>      >      >>>>>>>
>      >      >>>>>>> // Copy unicode path
>      >      >>>>>>> memcpy(result + prefix_len - prefix_off, unicode_path, sizeof(WCHAR) * unicode_path_len);
>      >      >>>>>>> // Copy prefix
>      >      >>>>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>      >      >>>>>>>
>      >      >>>>>>> With these changes the os_windows gtest and the appcds jtreg tests run without errors.
>      >      >>>>>>>
>      >      >>>>>>> Best regards,
>      >      >>>>>>> Ralf
>      >      >>>>>>>
>      >      >>>>>
>      >      >>>
>      >      >
>      >
>