RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters

Yasumasa Suenaga suenaga at oss.nttdata.com
Wed Mar 4 12:18:56 UTC 2020


Hi Ioi, Ralf,

How about this change?
If we can use macro and alloca(), we can fix more simply.

   http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.04-2/


Thanks,

Yasumasa


On 2020/03/04 17:22, Yasumasa Suenaga wrote:
> Hi Ioi,
> 
> Thanks for your comment.
> How about this change?
> 
>    http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.04/
> 
> This webrev adds the function both converting to unicode and getting full path,
> they would be called from wide_abs_unc_path().
> Also MemoryCleaner class which is introduced in this webrev frees memory auomatically in d'tor.
> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
> On 2020/03/04 16:05, Ioi Lam wrote:
>>
>>
>> On 3/3/20 10:22 PM, Yasumasa Suenaga wrote:
>>> Hi Ioi,
>>>
>>>> [1] I think the memory freeing can be done using a destructor, so you can simply return in case of error, without having a lot of "if" checks:
>>>
>>> What do you think to use `goto` statement like Linux kernel?
>>>
>>>
>> I don't have a strong opinion but I think many/most HotSpot developers are against it. Also with goto, you need to store the return value in a variable, but with destructors, you can just say "return value", so the code is cleaner.
>>
>> Thanks
>> - Ioi
>>
>>> Yasumasa
>>>
>>>
>>> On 2020/03/04 14:50, Ioi Lam wrote:
>>>> Hi Yasumasa,
>>>>
>>>> I think this function is getting too long and it's hard to figure out all the error conditions.
>>>>
>>>> (I feel sorry for you because the code wasn't pretty messy to begin with, but I think it's at a point that we need to do the cleanup. It took me like 30 minutes to understand what is going on)
>>>>
>>>> [0] I noticed you changed the return type to LPWSTR, but all the callers are using wchar_t*. Are these two types the same?
>>>>
>>>>      If you want to use LPWSTR within this function for consistency, maybe you can return the value as
>>>>
>>>>      return (wchar_t*)result; // LPWSTR and wchat_t* are the same type on Windows.
>>>>
>>>>      That way people unfamiliar with Windows (most of us) won't be confused.
>>>>
>>>> [1] I think the memory freeing can be done using a destructor, so you can simply return in case of error, without having a lot of "if" checks:
>>>>
>>>>      class State {
>>>>         errno_t & err;
>>>>          char* buf;
>>>>          LPWSTR unicode_path;
>>>> LPWSTR result;
>>>>
>>>>          State(errno_t & e) : buf(NULL), unicode_path(NULL), result(NULL), err(e) {
>>>>              err = SUCCESS;
>>>>           }
>>>>          ~State() {
>>>>            os::free(buf);
>>>>            os::free(unicode_path);
>>>>            if (err != ERROR_SUCCESS) {
>>>>               os::free(result);
>>>>            }
>>>>          }
>>>>          LPWSTR as_error(errno_t e) {
>>>>              err = e;
>>>>              return NULL;
>>>>          }
>>>>      } st;
>>>>
>>>>
>>>>      if ((st.buf = static_cast<char*>(os::malloc(buf_size, mtInternal))) == NULL) {
>>>>          return st.as_error(ENOMEM);
>>>>      }
>>>>
>>>>      ....
>>>>      return st.result;
>>>>
>>>> [2] Will it be possible to break the functon down into smaller pieces, so it's easier to understand what's going on?
>>>>
>>>> E.g., Maybe the code that  converts the C string into a unicode string can be moved into a separate function?
>>>>
>>>>    // Get required buffer size to convert to Unicode
>>>>    int unicode_path_len = MultiByteToWideChar(CP_THREAD_ACP,
>>>> MB_ERR_INVALID_CHARS,
>>>>                                               buf, -1,
>>>>                                               NULL, 0);
>>>>    if (unicode_path_len == 0) {
>>>>      os::free(buf);
>>>>      err = EINVAL;
>>>>      return NULL;
>>>>    }
>>>>
>>>>    ....
>>>>
>>>>    LPWSTR unicode_path = static_cast<LPWSTR>(os::malloc(sizeof(WCHAR) * unicode_path_len, mtInternal));
>>>>    if (unicode_path == NULL) {
>>>>      err = ENOMEM;
>>>>    } else {
>>>>      // This call would be success because it is checked in above
>>>>      err = ERROR_SUCCESS;
>>>>      MultiByteToWideChar(CP_THREAD_ACP,
>>>>                          MB_ERR_INVALID_CHARS,
>>>>                          buf, -1,
>>>>                          unicode_path, unicode_path_len);
>>>>
>>>>
>>>>
>>>> [3] size_t result_len = prefix_len - prefix_off + additional_space;
>>>>
>>>> The variable name result_len is confusing. It's not the length of the result.
>>>>
>>>> [4] The old copying code was already hard to understand, and the new version now has 2 versions of the copying code. I think the reason is you want to avoid an extra memcpy by calling GetFullPathNameW directly on the result.
>>>>
>>>>      if (needs_fullpath) {
>>>>        // Get required buffer size to convert to full path
>>>>        DWORD full_path_len = GetFullPathNameW(unicode_path, 0, NULL, NULL);
>>>>        if (full_path_len == 0) {
>>>>          err = EINVAL;
>>>>        } else {
>>>>          size_t result_size = sizeof(WCHAR) * (result_len + full_path_len);
>>>>          result = static_cast<LPWSTR>(os::malloc(result_size, mtInternal));
>>>>          if (result == NULL) {
>>>>            err = ENOMEM;
>>>>          } else {
>>>>            // This call would be success because it is checked in above
>>>>            GetFullPathNameW(unicode_path, full_path_len, result + prefix_len - prefix_off, NULL);
>>>>            // Copy prefix
>>>>            memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>>>          }
>>>>        }
>>>>      } else {
>>>>        size_t result_size = sizeof(WCHAR) * (result_len + unicode_path_len);
>>>>        result = static_cast<LPWSTR>(os::malloc(result_size, mtInternal));
>>>>        if (result == NULL) {
>>>>          err = ENOMEM;
>>>>        } else {
>>>>          // Copy unicode path
>>>>          memcpy(result + prefix_len - prefix_off, unicode_path, sizeof(WCHAR) * unicode_path_len);
>>>>          // Copy prefix
>>>>          memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>>>        }
>>>>      }
>>>>
>>>> I think the code can be simplified if you do the fullpath version in a separate function:
>>>>
>>>>      if (needs_fullpath) {
>>>>         unicode_path = to_full_path(unicode_path, err, &unicode_path_len);
>>>>      }
>>>>
>>>>      size_t result_size = ......
>>>>
>>>> This will do one more memcpy, but I think it's not performance critical
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>
>>>> On 3/3/20 4:19 PM, Yasumasa Suenaga wrote:
>>>>> Thanks Ralf!
>>>>> I fixed them.
>>>>>
>>>>> I need a Reviewer to push, so I uploaded new webrev:
>>>>>
>>>>>   http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.03/
>>>>>
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2020/03/03 23:25, Schmelter, Ralf wrote:
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> thanks for your work. The code should now be able to handle any path.
>>>>>>
>>>>>> Two small changes (no new webrev needed):
>>>>>>
>>>>>> size_t result_len = prefix_len + prefix_off + additional_space
>>>>>> should be
>>>>>> size_t result_len = prefix_len - prefix_off + additional_space
>>>>>>
>>>>>> And you should copy the prefix after you copied the filename, since sometimes the prefix is required to overwrite the start of the filename:
>>>>>>
>>>>>> // This call would be success because it is checked in above
>>>>>> GetFullPathNameW(unicode_path, full_path_len, result + prefix_len - prefix_off, NULL);
>>>>>> // Copy prefix
>>>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>>>>>
>>>>>> and
>>>>>>
>>>>>> // Copy unicode path
>>>>>> memcpy(result + prefix_len - prefix_off, unicode_path, sizeof(WCHAR) * unicode_path_len);
>>>>>> // Copy prefix
>>>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>>>>>
>>>>>> With these changes the os_windows gtest and the appcds jtreg tests run without errors.
>>>>>>
>>>>>> Best regards,
>>>>>> Ralf
>>>>>>
>>>>
>>


More information about the hotspot-runtime-dev mailing list