RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters

Thu Mar 5 04:15:30 UTC 2020

Hi Yasumasa,

This looks very good. Thanks for bearing with my requests :-)

The return value of GetFullPathNameW is a bit weird. I would suggest 
adding comments so people can understand the code without reading the
GetFullPathNameW documentation:

4190   // Get required buffer size to convert to full path. The return
        // value INCLUDES the terminating null character.
4191   DWORD full_path_len = GetFullPathNameW(unicode_path, 0, NULL, NULL);

        // When the buffer has sufficient size, the return value 
EXCLUDES the
        // terminating null character.
4202   DWORD result = GetFullPathNameW(unicode_path, full_path_len, 
full_path, NULL);
4203   assert(result == (full_path_len - 1), "length already checked 
above");

(No need for new webrev).

Thanks
- Ioi

On 3/4/20 4:18 PM, Yasumasa Suenaga wrote:
> Hi Ioi,
>
> How about this change?
>
>   http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.05/
>
> I already tried to use ResouceMark, but it was crashed because 
> wide_abs_unc_path() would be called from outside of java thread at 
> boot time.
> This webrev uses os::free() instead of memory cleaner class, but I 
> think it is simple because I tweaked memory lifecycle.
>
>
> Thanks,
>
> Yasumasa
>
>
> On 2020/03/05 6:56, Ioi Lam wrote:
>> Hi Yasumasa,
>>
>> This version looks good!
>>
>>
>> [0] One thing I forgot ... is it possible to use "ResourceMark rm" in 
>> this function? If so, you can allocate the temp arrays using 
>> NEW_RESOURCE_ARRAY and then don't need to worry about freeing them.
>>
>>
>> [1] for this code
>>
>> 4202   // This call would be success because it is checked in above
>> 4203   GetFullPathNameW(unicode_path, full_path_len, full_path, NULL);
>>
>> How about this instead:
>>
>>        DWORD len = GetFullPathNameW(unicode_path, full_path_len, 
>> full_path, NULL);
>>        assert(len <= full_path_len, "length already checked above");
>>
>> (same for the MultiByteToWideChar call in convert_to_unicode)
>>
>>
>> [2] This is no longer needed
>>
>> 4240     os::free(buf);
>>
>>
>> [3] I think the following code is not needed ....
>>
>> 4269   } else {
>> 4270     size_t converted_path_size = sizeof(WCHAR) * 
>> (wcslen(unicode_path) + 1);
>> 4271     converted_path = 
>> reinterpret_cast<LPWSTR>(os::malloc(converted_path_size, mtInternal));
>> 4272     if (converted_path == NULL) {
>> 4273       vm_exit_out_of_memory(buf_size, OOM_MALLOC_ERROR, 
>> "wide_abs_unc_path");
>> 4274     }
>> 4275     memcpy(converted_path, unicode_path, converted_path_size);
>>
>>
>> ... if you free the original buffer inside get_full_path (or use 
>> ResourceMark):
>>
>> static errno_t get_full_path(LPCWSTR& unicode_path) {
>>     LPCWSTR full_path = .....;
>>     ....
>>     free(unicode_path);
>> unicode_path = full_path;
>>     return ERROR_SUCCESS;
>> }
>>
>> [4] This looks good! Much more readable than the original code!
>>
>> 4285   _snwprintf(result, result_len, L"%s%s", prefix, 
>> &converted_path[prefix_off]);
>>
>>
>> Thanks
>> - Ioi
>>
>>
>> On 3/4/20 12:22 AM, Yasumasa Suenaga wrote:
>>> Hi Ioi,
>>>
>>> Thanks for your comment.
>>> How about this change?
>>>
>>>   http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.04/
>>>
>>> This webrev adds the function both converting to unicode and getting 
>>> full path,
>>> they would be called from wide_abs_unc_path().
>>> Also MemoryCleaner class which is introduced in this webrev frees 
>>> memory auomatically in d'tor.
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> On 2020/03/04 16:05, Ioi Lam wrote:
>>>>
>>>>
>>>> On 3/3/20 10:22 PM, Yasumasa Suenaga wrote:
>>>>> Hi Ioi,
>>>>>
>>>>>> [1] I think the memory freeing can be done using a destructor, so 
>>>>>> you can simply return in case of error, without having a lot of 
>>>>>> "if" checks:
>>>>>
>>>>> What do you think to use `goto` statement like Linux kernel?
>>>>>
>>>>>
>>>> I don't have a strong opinion but I think many/most HotSpot 
>>>> developers are against it. Also with goto, you need to store the 
>>>> return value in a variable, but with destructors, you can just say 
>>>> "return value", so the code is cleaner.
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2020/03/04 14:50, Ioi Lam wrote:
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> I think this function is getting too long and it's hard to figure 
>>>>>> out all the error conditions.
>>>>>>
>>>>>> (I feel sorry for you because the code wasn't pretty messy to 
>>>>>> begin with, but I think it's at a point that we need to do the 
>>>>>> cleanup. It took me like 30 minutes to understand what is going on)
>>>>>>
>>>>>> [0] I noticed you changed the return type to LPWSTR, but all the 
>>>>>> callers are using wchar_t*. Are these two types the same?
>>>>>>
>>>>>>      If you want to use LPWSTR within this function for 
>>>>>> consistency, maybe you can return the value as
>>>>>>
>>>>>>      return (wchar_t*)result; // LPWSTR and wchat_t* are the same 
>>>>>> type on Windows.
>>>>>>
>>>>>>      That way people unfamiliar with Windows (most of us) won't 
>>>>>> be confused.
>>>>>>
>>>>>> [1] I think the memory freeing can be done using a destructor, so 
>>>>>> you can simply return in case of error, without having a lot of 
>>>>>> "if" checks:
>>>>>>
>>>>>>      class State {
>>>>>>         errno_t & err;
>>>>>>          char* buf;
>>>>>>          LPWSTR unicode_path;
>>>>>> LPWSTR result;
>>>>>>
>>>>>>          State(errno_t & e) : buf(NULL), unicode_path(NULL), 
>>>>>> result(NULL), err(e) {
>>>>>>              err = SUCCESS;
>>>>>>           }
>>>>>>          ~State() {
>>>>>>            os::free(buf);
>>>>>>            os::free(unicode_path);
>>>>>>            if (err != ERROR_SUCCESS) {
>>>>>>               os::free(result);
>>>>>>            }
>>>>>>          }
>>>>>>          LPWSTR as_error(errno_t e) {
>>>>>>              err = e;
>>>>>>              return NULL;
>>>>>>          }
>>>>>>      } st;
>>>>>>
>>>>>>
>>>>>>      if ((st.buf = static_cast<char*>(os::malloc(buf_size, 
>>>>>> mtInternal))) == NULL) {
>>>>>>          return st.as_error(ENOMEM);
>>>>>>      }
>>>>>>
>>>>>>      ....
>>>>>>      return st.result;
>>>>>>
>>>>>> [2] Will it be possible to break the functon down into smaller 
>>>>>> pieces, so it's easier to understand what's going on?
>>>>>>
>>>>>> E.g., Maybe the code that  converts the C string into a unicode 
>>>>>> string can be moved into a separate function?
>>>>>>
>>>>>>    // Get required buffer size to convert to Unicode
>>>>>>    int unicode_path_len = MultiByteToWideChar(CP_THREAD_ACP,
>>>>>> MB_ERR_INVALID_CHARS,
>>>>>>                                               buf, -1,
>>>>>>                                               NULL, 0);
>>>>>>    if (unicode_path_len == 0) {
>>>>>>      os::free(buf);
>>>>>>      err = EINVAL;
>>>>>>      return NULL;
>>>>>>    }
>>>>>>
>>>>>>    ....
>>>>>>
>>>>>>    LPWSTR unicode_path = 
>>>>>> static_cast<LPWSTR>(os::malloc(sizeof(WCHAR) * unicode_path_len, 
>>>>>> mtInternal));
>>>>>>    if (unicode_path == NULL) {
>>>>>>      err = ENOMEM;
>>>>>>    } else {
>>>>>>      // This call would be success because it is checked in above
>>>>>>      err = ERROR_SUCCESS;
>>>>>>      MultiByteToWideChar(CP_THREAD_ACP,
>>>>>>                          MB_ERR_INVALID_CHARS,
>>>>>>                          buf, -1,
>>>>>>                          unicode_path, unicode_path_len);
>>>>>>
>>>>>>
>>>>>>
>>>>>> [3] size_t result_len = prefix_len - prefix_off + additional_space;
>>>>>>
>>>>>> The variable name result_len is confusing. It's not the length of 
>>>>>> the result.
>>>>>>
>>>>>> [4] The old copying code was already hard to understand, and the 
>>>>>> new version now has 2 versions of the copying code. I think the 
>>>>>> reason is you want to avoid an extra memcpy by calling 
>>>>>> GetFullPathNameW directly on the result.
>>>>>>
>>>>>>      if (needs_fullpath) {
>>>>>>        // Get required buffer size to convert to full path
>>>>>>        DWORD full_path_len = GetFullPathNameW(unicode_path, 0, 
>>>>>> NULL, NULL);
>>>>>>        if (full_path_len == 0) {
>>>>>>          err = EINVAL;
>>>>>>        } else {
>>>>>>          size_t result_size = sizeof(WCHAR) * (result_len + 
>>>>>> full_path_len);
>>>>>>          result = static_cast<LPWSTR>(os::malloc(result_size, 
>>>>>> mtInternal));
>>>>>>          if (result == NULL) {
>>>>>>            err = ENOMEM;
>>>>>>          } else {
>>>>>>            // This call would be success because it is checked in 
>>>>>> above
>>>>>>            GetFullPathNameW(unicode_path, full_path_len, result + 
>>>>>> prefix_len - prefix_off, NULL);
>>>>>>            // Copy prefix
>>>>>>            memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>>>>>          }
>>>>>>        }
>>>>>>      } else {
>>>>>>        size_t result_size = sizeof(WCHAR) * (result_len + 
>>>>>> unicode_path_len);
>>>>>>        result = static_cast<LPWSTR>(os::malloc(result_size, 
>>>>>> mtInternal));
>>>>>>        if (result == NULL) {
>>>>>>          err = ENOMEM;
>>>>>>        } else {
>>>>>>          // Copy unicode path
>>>>>>          memcpy(result + prefix_len - prefix_off, unicode_path, 
>>>>>> sizeof(WCHAR) * unicode_path_len);
>>>>>>          // Copy prefix
>>>>>>          memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>>>>>        }
>>>>>>      }
>>>>>>
>>>>>> I think the code can be simplified if you do the fullpath version 
>>>>>> in a separate function:
>>>>>>
>>>>>>      if (needs_fullpath) {
>>>>>>         unicode_path = to_full_path(unicode_path, err, 
>>>>>> &unicode_path_len);
>>>>>>      }
>>>>>>
>>>>>>      size_t result_size = ......
>>>>>>
>>>>>> This will do one more memcpy, but I think it's not performance 
>>>>>> critical
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>> On 3/3/20 4:19 PM, Yasumasa Suenaga wrote:
>>>>>>> Thanks Ralf!
>>>>>>> I fixed them.
>>>>>>>
>>>>>>> I need a Reviewer to push, so I uploaded new webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.03/
>>>>>>>
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2020/03/03 23:25, Schmelter, Ralf wrote:
>>>>>>>> Hi Yasumasa,
>>>>>>>>
>>>>>>>> thanks for your work. The code should now be able to handle any 
>>>>>>>> path.
>>>>>>>>
>>>>>>>> Two small changes (no new webrev needed):
>>>>>>>>
>>>>>>>> size_t result_len = prefix_len + prefix_off + additional_space
>>>>>>>> should be
>>>>>>>> size_t result_len = prefix_len - prefix_off + additional_space
>>>>>>>>
>>>>>>>> And you should copy the prefix after you copied the filename, 
>>>>>>>> since sometimes the prefix is required to overwrite the start 
>>>>>>>> of the filename:
>>>>>>>>
>>>>>>>> // This call would be success because it is checked in above
>>>>>>>> GetFullPathNameW(unicode_path, full_path_len, result + 
>>>>>>>> prefix_len - prefix_off, NULL);
>>>>>>>> // Copy prefix
>>>>>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>>>>>>>
>>>>>>>> and
>>>>>>>>
>>>>>>>> // Copy unicode path
>>>>>>>> memcpy(result + prefix_len - prefix_off, unicode_path, 
>>>>>>>> sizeof(WCHAR) * unicode_path_len);
>>>>>>>> // Copy prefix
>>>>>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>>>>>>>
>>>>>>>> With these changes the os_windows gtest and the appcds jtreg 
>>>>>>>> tests run without errors.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Ralf
>>>>>>>>
>>>>>>
>>>>
>>