RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters

Wed Mar 4 05:50:08 UTC 2020

Hi Yasumasa,

I think this function is getting too long and it's hard to figure out 
all the error conditions.

(I feel sorry for you because the code wasn't pretty messy to begin 
with, but I think it's at a point that we need to do the cleanup. It 
took me like 30 minutes to understand what is going on)

[0] I noticed you changed the return type to LPWSTR, but all the callers 
are using wchar_t*. Are these two types the same?

     If you want to use LPWSTR within this function for consistency, 
maybe you can return the value as

     return (wchar_t*)result; // LPWSTR and wchat_t* are the same type 
on Windows.

     That way people unfamiliar with Windows (most of us) won't be confused.

[1] I think the memory freeing can be done using a destructor, so you 
can simply return in case of error, without having a lot of "if" checks:

     class State {
        errno_t & err;
         char* buf;
         LPWSTR unicode_path;
LPWSTR result;

         State(errno_t & e) : buf(NULL), unicode_path(NULL), 
result(NULL), err(e) {
             err = SUCCESS;
          }
         ~State() {
           os::free(buf);
           os::free(unicode_path);
           if (err != ERROR_SUCCESS) {
              os::free(result);
           }
         }
         LPWSTR as_error(errno_t e) {
             err = e;
             return NULL;
         }
     } st;

     if ((st.buf = static_cast<char*>(os::malloc(buf_size, mtInternal))) 
== NULL) {
         return st.as_error(ENOMEM);
     }

     ....
     return st.result;

[2] Will it be possible to break the functon down into smaller pieces, 
so it's easier to understand what's going on?

E.g., Maybe the code that  converts the C string into a unicode string 
can be moved into a separate function?

   // Get required buffer size to convert to Unicode
   int unicode_path_len = MultiByteToWideChar(CP_THREAD_ACP,
                                              MB_ERR_INVALID_CHARS,
                                              buf, -1,
                                              NULL, 0);
   if (unicode_path_len == 0) {
     os::free(buf);
     err = EINVAL;
     return NULL;
   }

   ....

   LPWSTR unicode_path = static_cast<LPWSTR>(os::malloc(sizeof(WCHAR) * 
unicode_path_len, mtInternal));
   if (unicode_path == NULL) {
     err = ENOMEM;
   } else {
     // This call would be success because it is checked in above
     err = ERROR_SUCCESS;
     MultiByteToWideChar(CP_THREAD_ACP,
                         MB_ERR_INVALID_CHARS,
                         buf, -1,
                         unicode_path, unicode_path_len);

[3] size_t result_len = prefix_len - prefix_off + additional_space;

The variable name result_len is confusing. It's not the length of the 
result.

[4] The old copying code was already hard to understand, and the new 
version now has 2 versions of the copying code. I think the reason is 
you want to avoid an extra memcpy by calling GetFullPathNameW directly 
on the result.

     if (needs_fullpath) {
       // Get required buffer size to convert to full path
       DWORD full_path_len = GetFullPathNameW(unicode_path, 0, NULL, NULL);
       if (full_path_len == 0) {
         err = EINVAL;
       } else {
         size_t result_size = sizeof(WCHAR) * (result_len + full_path_len);
         result = static_cast<LPWSTR>(os::malloc(result_size, mtInternal));
         if (result == NULL) {
           err = ENOMEM;
         } else {
           // This call would be success because it is checked in above
           GetFullPathNameW(unicode_path, full_path_len, result + 
prefix_len - prefix_off, NULL);
           // Copy prefix
           memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
         }
       }
     } else {
       size_t result_size = sizeof(WCHAR) * (result_len + unicode_path_len);
       result = static_cast<LPWSTR>(os::malloc(result_size, mtInternal));
       if (result == NULL) {
         err = ENOMEM;
       } else {
         // Copy unicode path
         memcpy(result + prefix_len - prefix_off, unicode_path, 
sizeof(WCHAR) * unicode_path_len);
         // Copy prefix
         memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
       }
     }

I think the code can be simplified if you do the fullpath version in a 
separate function:

     if (needs_fullpath) {
        unicode_path = to_full_path(unicode_path, err, &unicode_path_len);
     }

     size_t result_size = ......

This will do one more memcpy, but I think it's not performance critical

Thanks
- Ioi

On 3/3/20 4:19 PM, Yasumasa Suenaga wrote:
> Thanks Ralf!
> I fixed them.
>
> I need a Reviewer to push, so I uploaded new webrev:
>
>   http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.03/
>
>
> Yasumasa
>
>
> On 2020/03/03 23:25, Schmelter, Ralf wrote:
>> Hi Yasumasa,
>>
>> thanks for your work. The code should now be able to handle any path.
>>
>> Two small changes (no new webrev needed):
>>
>> size_t result_len = prefix_len + prefix_off + additional_space
>> should be
>> size_t result_len = prefix_len - prefix_off + additional_space
>>
>> And you should copy the prefix after you copied the filename, since 
>> sometimes the prefix is required to overwrite the start of the filename:
>>
>> // This call would be success because it is checked in above
>> GetFullPathNameW(unicode_path, full_path_len, result + prefix_len - 
>> prefix_off, NULL);
>> // Copy prefix
>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>
>> and
>>
>> // Copy unicode path
>> memcpy(result + prefix_len - prefix_off, unicode_path, sizeof(WCHAR) 
>> * unicode_path_len);
>> // Copy prefix
>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
>>
>> With these changes the os_windows gtest and the appcds jtreg tests 
>> run without errors.
>>
>> Best regards,
>> Ralf
>>