RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters

Yasumasa Suenaga suenaga at oss.nttdata.com
Thu Mar 5 05:39:26 UTC 2020


Thanks Ioi!
Ralf, are you ok to this change?

   http://hg.openjdk.java.net/jdk/submit/rev/8f25e02f0432
     (all tests on submit repo has passed)

Yasumasa


On 2020/03/05 13:15, Ioi Lam wrote:
> Hi Yasumasa,
> 
> This looks very good. Thanks for bearing with my requests :-)
> 
> The return value of GetFullPathNameW is a bit weird. I would suggest adding comments so people can understand the code without reading the
> GetFullPathNameW documentation:
> 
> 
> 4190   // Get required buffer size to convert to full path. The return
>         // value INCLUDES the terminating null character.
> 4191   DWORD full_path_len = GetFullPathNameW(unicode_path, 0, NULL, NULL);
> 
>         // When the buffer has sufficient size, the return value EXCLUDES the
>         // terminating null character.
> 4202   DWORD result = GetFullPathNameW(unicode_path, full_path_len, full_path, NULL);
> 4203   assert(result == (full_path_len - 1), "length already checked above");
> 
> (No need for new webrev).
> 
> Thanks
> - Ioi
> 
> 
> 
> 
> On 3/4/20 4:18 PM, Yasumasa Suenaga wrote:
>> Hi Ioi,
>>
>> How about this change?
>>
>>   http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.05/
>>
>> I already tried to use ResouceMark, but it was crashed because wide_abs_unc_path() would be called from outside of java thread at boot time.
>> This webrev uses os::free() instead of memory cleaner class, but I think it is simple because I tweaked memory lifecycle.
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>> On 2020/03/05 6:56, Ioi Lam wrote:
>>> Hi Yasumasa,
>>>
>>> This version looks good!
>>>
>>>
>>> [0] One thing I forgot ... is it possible to use "ResourceMark rm" in this function? If so, you can allocate the temp arrays using NEW_RESOURCE_ARRAY and then don't need to worry about freeing them.
>>>
>>>
>>> [1] for this code
>>>
>>> 4202   // This call would be success because it is checked in above
>>> 4203   GetFullPathNameW(unicode_path, full_path_len, full_path, NULL);
>>>
>>> How about this instead:
>>>
>>>        DWORD len = GetFullPathNameW(unicode_path, full_path_len, full_path, NULL);
>>>        assert(len <= full_path_len, "length already checked above");
>>>
>>> (same for the MultiByteToWideChar call in convert_to_unicode)
>>>
>>>
>>> [2] This is no longer needed
>>>
>>> 4240     os::free(buf);
>>>
>>>
>>> [3] I think the following code is not needed ....
>>>
>>> 4269   } else {
>>> 4270     size_t converted_path_size = sizeof(WCHAR) * (wcslen(unicode_path) + 1);
>>> 4271     converted_path = reinterpret_cast<LPWSTR>(os::malloc(converted_path_size, mtInternal));
>>> 4272     if (converted_path == NULL) {
>>> 4273       vm_exit_out_of_memory(buf_size, OOM_MALLOC_ERROR, "wide_abs_unc_path");
>>> 4274     }
>>> 4275     memcpy(converted_path, unicode_path, converted_path_size);
>>>
>>>
>>> ... if you free the original buffer inside get_full_path (or use ResourceMark):
>>>
>>> static errno_t get_full_path(LPCWSTR& unicode_path) {
>>>     LPCWSTR full_path = .....;
>>>     ....
>>>     free(unicode_path);
>>> unicode_path = full_path;
>>>     return ERROR_SUCCESS;
>>> }
>>>
>>> [4] This looks good! Much more readable than the original code!
>>>
>>> 4285   _snwprintf(result, result_len, L"%s%s", prefix, &converted_path[prefix_off]);
>>>
>>>
>>> Thanks
>>> - Ioi
>>>
>>>
>>> On 3/4/20 12:22 AM, Yasumasa Suenaga wrote:
>>>> Hi Ioi,
>>>>
>>>> Thanks for your comment.
>>>> How about this change?
>>>>
>>>>   http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.04/
>>>>
>>>> This webrev adds the function both converting to unicode and getting full path,
>>>> they would be called from wide_abs_unc_path().
>>>> Also MemoryCleaner class which is introduced in this webrev frees memory auomatically in d'tor.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2020/03/04 16:05, Ioi Lam wrote:
>>>>>
>>>>>
>>>>> On 3/3/20 10:22 PM, Yasumasa Suenaga wrote:
>>>>>> Hi Ioi,
>>>>>>
>>>>>>> [1] I think the memory freeing can be done using a destructor, so you can simply return in case of error, without having a lot of "if" checks:
>>>>>>
>>>>>> What do you think to use `goto` statement like Linux kernel?
>>>>>>
>>>>>>
>>>>> I don't have a strong opinion but I think many/most HotSpot developers are against it. Also with goto, you need to store the return value in a variable, but with destructors, you can just say "return value", so the code is cleaner.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> On 2020/03/04 14:50, Ioi Lam wrote:
>>>>>>> Hi Yasumasa,
>>>>>>>
>>>>>>> I think this function is getting too long and it's hard to figure out all the error conditions.
>>>>>>>
>>>>>>> (I feel sorry for you because the code wasn't pretty messy to begin with, but I think it's at a point that we need to do the cleanup. It took me like 30 minutes to understand what is going on)
>>>>>>>
>>>>>>> [0] I noticed you changed the return type to LPWSTR, but all the callers are using wchar_t*. Are these two types the same?
>>>>>>>
>>>>>>>      If you want to use LPWSTR within this function for consistency, maybe you can return the value as
>>>>>>>
>>>>>>>      return (wchar_t*)result; // LPWSTR and wchat_t* are the same type on Windows.
>>>>>>>
>>>>>>>      That way people unfamiliar with Windows (most of us) won't be confused.
>>>>>>>
>>>>>>> [1] I think the memory freeing can be done using a destructor, so you can simply return in case of error, without having a lot of "if" checks:
>>>>>>>
>>>>>>>      class State {
>>>>>>>         errno_t & err;
>>>>>>>          char* buf;
>>>>>>>          LPWSTR unicode_path;
>>>>>>> LPWSTR result;
>>>>>>>
>>>>>>>          State(errno_t & e) : buf(NULL), unicode_path(NULL), result(NULL), err(e) {
>>>>>>>              err = SUCCESS;
>>>>>>>           }
>>>>>>>          ~State() {
>>>>>>>            os::free(buf);
>>>>>>>            os::free(unicode_path);
>>>>>>>            if (err != ERROR_SUCCESS) {
>>>>>>>               os::free(result);
>>>>>>>            }
>>>>>>>          }
>>>>>>>          LPWSTR as_error(errno_t e) {
>>>>>>>              err = e;
>>>>>>>              return NULL;
>>>>>>>          }
>>>>>>>      } st;
>>>>>>>
>>>>>>>
>>>>>>>      if ((st.buf = static_cast<char*>(os::malloc(buf_size, mtInternal))) == NULL) {
>>>>>>>          return st.as_error(ENOMEM);
>>>>>>>      }
>>>>>>>
>>>>>>>      ....
>>>>>>>      return st.result;
>>>>>>>
>>>>>>> [2] Will it be possible to break the functon down into smaller pieces, so it's easier to understand what's going on?
>>>>>>>
>>>>>>> E.g., Maybe the code that  converts the C string into a unicode string can be moved into a separate function?
>>>>>>>
>>>>>>>    // Get required buffer size to convert to Unicode
>>>>>>>    int unicode_path_len = MultiByteToWideChar(CP_THREAD_ACP,
>>>>>>> MB_ERR_INVALID_CHARS,
>>>>>>>                                               buf, -1,
>>>>>>>                                               NULL, 0);
>>>>>>>    if (unicode_path_len == 0) {
>>>>>>>      os::free(buf);
>>>>>>>      err = EINVAL;
>>>>>>>      return NULL;
>>>>>>>    }
>>>>>>>
>>>>>>>    ....
>>>>>>>
>>>>>>>    LPWSTR unicode_path = static_cast<LPWSTR>(os::malloc(sizeof(WCHAR) * unicode_path_len, mtInternal));
>>>>>>>    if (unicode_path == NULL) {
>>>>>>>      err = ENOMEM;
>>>>>>>    } else {
>>>>>>>      // This call would be success because it is checked in above
>>>>>>>      err = ERROR_SUCCESS;
>>>>>>>      MultiByteToWideChar(CP_THREAD_ACP,
>>>>>>>                          MB_ERR_INVALID_CHARS,
>>>>>>>                          buf, -1,
>>>>>>>                          unicode_path, unicode_path_len);
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [3] size_t result_len = prefix_len - prefix_off + additional_space;
>>>>>>>
>>>>>>> The variable name result_len is confusing. It's not the length of the result.
>>>>>>>
>>>>>>> [4] The old copying code was already hard to understand, and the new version now has 2 versions of the copying code. I think the reason is you want to avoid an extra memcpy by calling GetFullPathNameW directly on the result.
>>>>>>>
>>>>>>>      if (needs_fullpath) {
>>>>>>>        // Get required buffer size to convert to full path
>>>>>>>        DWORD full_path_len = GetFullPathNameW(unicode_path, 0, NULL, NULL);
>>>>>>>        if (full_path_len == 0) {
>>>>>>>          err = EINVAL;
>>>>>>>        } else {
>>>>>>>          size_t result_size = sizeof(WCHAR) * (result_len + full_path_len);
>>>>>>>          result = static_cast<LPWSTR>(os::malloc(result_size, mtInternal));
>>>>>>>          if (result == NULL) {
>>>>>>>            err = ENOMEM;
>>>>>>>          } else {
>>>>>>>            // This call would be success because it is checked in above
>>>>>>>            GetFullPathNameW(unicode_path, full_path_len, result + prefix_len - prefix_off, NULL);
>>>>>>>            // Copy prefix
>>>>>>>            memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>>>>>>          }
>>>>>>>        }
>>>>>>>      } else {
>>>>>>>        size_t result_size = sizeof(WCHAR) * (result_len + unicode_path_len);
>>>>>>>        result = static_cast<LPWSTR>(os::malloc(result_size, mtInternal));
>>>>>>>        if (result == NULL) {
>>>>>>>          err = ENOMEM;
>>>>>>>        } else {
>>>>>>>          // Copy unicode path
>>>>>>>          memcpy(result + prefix_len - prefix_off, unicode_path, sizeof(WCHAR) * unicode_path_len);
>>>>>>>          // Copy prefix
>>>>>>>          memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>>>>>>        }
>>>>>>>      }
>>>>>>>
>>>>>>> I think the code can be simplified if you do the fullpath version in a separate function:
>>>>>>>
>>>>>>>      if (needs_fullpath) {
>>>>>>>         unicode_path = to_full_path(unicode_path, err, &unicode_path_len);
>>>>>>>      }
>>>>>>>
>>>>>>>      size_t result_size = ......
>>>>>>>
>>>>>>> This will do one more memcpy, but I think it's not performance critical
>>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>>
>>>>>>>
>>>>>>> On 3/3/20 4:19 PM, Yasumasa Suenaga wrote:
>>>>>>>> Thanks Ralf!
>>>>>>>> I fixed them.
>>>>>>>>
>>>>>>>> I need a Reviewer to push, so I uploaded new webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.03/
>>>>>>>>
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2020/03/03 23:25, Schmelter, Ralf wrote:
>>>>>>>>> Hi Yasumasa,
>>>>>>>>>
>>>>>>>>> thanks for your work. The code should now be able to handle any path.
>>>>>>>>>
>>>>>>>>> Two small changes (no new webrev needed):
>>>>>>>>>
>>>>>>>>> size_t result_len = prefix_len + prefix_off + additional_space
>>>>>>>>> should be
>>>>>>>>> size_t result_len = prefix_len - prefix_off + additional_space
>>>>>>>>>
>>>>>>>>> And you should copy the prefix after you copied the filename, since sometimes the prefix is required to overwrite the start of the filename:
>>>>>>>>>
>>>>>>>>> // This call would be success because it is checked in above
>>>>>>>>> GetFullPathNameW(unicode_path, full_path_len, result + prefix_len - prefix_off, NULL);
>>>>>>>>> // Copy prefix
>>>>>>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>> // Copy unicode path
>>>>>>>>> memcpy(result + prefix_len - prefix_off, unicode_path, sizeof(WCHAR) * unicode_path_len);
>>>>>>>>> // Copy prefix
>>>>>>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>>>>>>>>
>>>>>>>>> With these changes the os_windows gtest and the appcds jtreg tests run without errors.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Ralf
>>>>>>>>>
>>>>>>>
>>>>>
>>>
> 


More information about the hotspot-runtime-dev mailing list