RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters
Thomas Stüfe
thomas.stuefe at gmail.com
Thu Mar 5 06:38:38 UTC 2020
Looks good to me now, thanks for the changes and your perseverance!
All good from my side.
..Thomas
On Thu, Mar 5, 2020 at 7:28 AM Yasumasa Suenaga <suenaga at oss.nttdata.com>
wrote:
> Hi Thomas,
>
> Thanks for your comment.
> I fixed them in new webrev:
>
> http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.06/
>
>
> Yasumasa
>
>
> On 2020/03/05 14:43, Thomas Stüfe wrote:
> > Hi Yasumasa,
> >
> > Had a glance over version 05. Some remarks:
> >
> > --
> >
> > I dislike the reinterpret_cast<> and do not understand the need for it.
> We do not do this in other places with malloc. Strictly speaking this would
> be UB since it is not guaranteed for A=reinterpret_cast<B> that A==B.
> >
> > In any way you should use NEW_C_HEAP_ARRAY and FREE_C_HEAP_ARRAY instead
> since those do the casting for you.
> >
> > They also exit the VM in case of OOM so no need to handle that manually.
> >
> > --
> >
> > In wide_abs_unc_path:
> >
> > + // Need to allocate at least room for 3 characters, since
> os::native_path transforms C: to C:.
> > + size_t buf_size = 1 + MAX2((size_t)3, strlen(path));
> > + char* buf = reinterpret_cast<char*>(os::malloc(buf_size, mtInternal));
> > + if (buf == NULL) {
> > + vm_exit_out_of_memory(buf_size, OOM_MALLOC_ERROR,
> "wide_abs_unc_path");
> > + }
> > + memcpy(buf, path, buf_size);
> > + os::native_path(buf);
> >
> > The memcpy is wrong. You have a rare chance of triggering a read fault
> since you read over the limit of path. Just use strcpy.
> >
> > --
> >
> > The comment about "additional_space" confused me until I read the
> sources, sp I would suggest reforming it as such:
> >
> > -// additional_space is the number of additionally allocated wchars
> after the terminating L'\0'.
> > +// additional_space is the size of space, in wchar_t, the function will
> additionally add to the allocation of return buffer (such that the size of
> the returned buffer is at least wcslen(buf) + 1 + additional_space).
> >
> > --
> >
> > I would also omit this comment:
> >
> > +// This is based on pathToNTPath() in io_util_md.cpp, but omits the
> optimizations for
> > +// short paths.
> >
> > I do not think anyone reading this code cares.
> >
> > --
> >
> > Hmm, small style nit, sometimes we return errno_t and the actual return
> value is via output parameter,
> >
> > +static errno_t convert_to_unicode(char const* char_path, LPWSTR&
> unicode_path) {
> >
> > +static errno_t get_full_path(LPCWSTR unicode_path, LPWSTR& full_path) {
> >
> > and here its the other way around:
> >
> > +static wchar_t* wide_abs_unc_path(char const* path, errno_t & err, int
> additional_space = 0) {
> >
> > I do not care much, but it seems strange.
> >
> > --
> >
> > +static errno_t convert_to_unicode(char const* char_path, LPWSTR&
> unicode_path)
> >
> > + LPWSTR unicode_path;
> > + err = convert_to_unicode(buf, unicode_path);
> > + os::free(buf);
> >
> > Sigh - I really do dislike passing output parameters via reference,
> since at the call site is is indistinguishable from an input parameter.
> > I know C++ programmers love this though so I won't fight this here...
> >
> > Could you please at least initialize unicode_path, since you do this in
> other places with output parameters too?
> >
> > --
> >
> > The rest seems fine.
> >
> > Cheers, Thomas
> >
> > On Thu, Mar 5, 2020 at 1:18 AM Yasumasa Suenaga <suenaga at oss.nttdata.com
> <mailto:suenaga at oss.nttdata.com>> wrote:
> >
> > Hi Ioi,
> >
> > How about this change?
> >
> > http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.05/
> >
> > I already tried to use ResouceMark, but it was crashed because
> wide_abs_unc_path() would be called from outside of java thread at boot
> time.
> > This webrev uses os::free() instead of memory cleaner class, but I
> think it is simple because I tweaked memory lifecycle.
> >
> >
> > Thanks,
> >
> > Yasumasa
> >
> >
> > On 2020/03/05 6:56, Ioi Lam wrote:
> > > Hi Yasumasa,
> > >
> > > This version looks good!
> > >
> > >
> > > [0] One thing I forgot ... is it possible to use "ResourceMark
> rm" in this function? If so, you can allocate the temp arrays using
> NEW_RESOURCE_ARRAY and then don't need to worry about freeing them.
> > >
> > >
> > > [1] for this code
> > >
> > > 4202 // This call would be success because it is checked in
> above
> > > 4203 GetFullPathNameW(unicode_path, full_path_len, full_path,
> NULL);
> > >
> > > How about this instead:
> > >
> > > DWORD len = GetFullPathNameW(unicode_path, full_path_len,
> full_path, NULL);
> > > assert(len <= full_path_len, "length already checked
> above");
> > >
> > > (same for the MultiByteToWideChar call in convert_to_unicode)
> > >
> > >
> > > [2] This is no longer needed
> > >
> > > 4240 os::free(buf);
> > >
> > >
> > > [3] I think the following code is not needed ....
> > >
> > > 4269 } else {
> > > 4270 size_t converted_path_size = sizeof(WCHAR) *
> (wcslen(unicode_path) + 1);
> > > 4271 converted_path =
> reinterpret_cast<LPWSTR>(os::malloc(converted_path_size, mtInternal));
> > > 4272 if (converted_path == NULL) {
> > > 4273 vm_exit_out_of_memory(buf_size, OOM_MALLOC_ERROR,
> "wide_abs_unc_path");
> > > 4274 }
> > > 4275 memcpy(converted_path, unicode_path,
> converted_path_size);
> > >
> > >
> > > ... if you free the original buffer inside get_full_path (or use
> ResourceMark):
> > >
> > > static errno_t get_full_path(LPCWSTR& unicode_path) {
> > > LPCWSTR full_path = .....;
> > > ....
> > > free(unicode_path);
> > > unicode_path = full_path;
> > > return ERROR_SUCCESS;
> > > }
> > >
> > > [4] This looks good! Much more readable than the original code!
> > >
> > > 4285 _snwprintf(result, result_len, L"%s%s", prefix,
> &converted_path[prefix_off]);
> > >
> > >
> > > Thanks
> > > - Ioi
> > >
> > >
> > > On 3/4/20 12:22 AM, Yasumasa Suenaga wrote:
> > >> Hi Ioi,
> > >>
> > >> Thanks for your comment.
> > >> How about this change?
> > >>
> > >> http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.04/
> > >>
> > >> This webrev adds the function both converting to unicode and
> getting full path,
> > >> they would be called from wide_abs_unc_path().
> > >> Also MemoryCleaner class which is introduced in this webrev
> frees memory auomatically in d'tor.
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Yasumasa
> > >>
> > >>
> > >> On 2020/03/04 16:05, Ioi Lam wrote:
> > >>>
> > >>>
> > >>> On 3/3/20 10:22 PM, Yasumasa Suenaga wrote:
> > >>>> Hi Ioi,
> > >>>>
> > >>>>> [1] I think the memory freeing can be done using a
> destructor, so you can simply return in case of error, without having a lot
> of "if" checks:
> > >>>>
> > >>>> What do you think to use `goto` statement like Linux kernel?
> > >>>>
> > >>>>
> > >>> I don't have a strong opinion but I think many/most HotSpot
> developers are against it. Also with goto, you need to store the return
> value in a variable, but with destructors, you can just say "return value",
> so the code is cleaner.
> > >>>
> > >>> Thanks
> > >>> - Ioi
> > >>>
> > >>>> Yasumasa
> > >>>>
> > >>>>
> > >>>> On 2020/03/04 14:50, Ioi Lam wrote:
> > >>>>> Hi Yasumasa,
> > >>>>>
> > >>>>> I think this function is getting too long and it's hard to
> figure out all the error conditions.
> > >>>>>
> > >>>>> (I feel sorry for you because the code wasn't pretty messy to
> begin with, but I think it's at a point that we need to do the cleanup. It
> took me like 30 minutes to understand what is going on)
> > >>>>>
> > >>>>> [0] I noticed you changed the return type to LPWSTR, but all
> the callers are using wchar_t*. Are these two types the same?
> > >>>>>
> > >>>>> If you want to use LPWSTR within this function for
> consistency, maybe you can return the value as
> > >>>>>
> > >>>>> return (wchar_t*)result; // LPWSTR and wchat_t* are the
> same type on Windows.
> > >>>>>
> > >>>>> That way people unfamiliar with Windows (most of us)
> won't be confused.
> > >>>>>
> > >>>>> [1] I think the memory freeing can be done using a
> destructor, so you can simply return in case of error, without having a lot
> of "if" checks:
> > >>>>>
> > >>>>> class State {
> > >>>>> errno_t & err;
> > >>>>> char* buf;
> > >>>>> LPWSTR unicode_path;
> > >>>>> LPWSTR result;
> > >>>>>
> > >>>>> State(errno_t & e) : buf(NULL), unicode_path(NULL),
> result(NULL), err(e) {
> > >>>>> err = SUCCESS;
> > >>>>> }
> > >>>>> ~State() {
> > >>>>> os::free(buf);
> > >>>>> os::free(unicode_path);
> > >>>>> if (err != ERROR_SUCCESS) {
> > >>>>> os::free(result);
> > >>>>> }
> > >>>>> }
> > >>>>> LPWSTR as_error(errno_t e) {
> > >>>>> err = e;
> > >>>>> return NULL;
> > >>>>> }
> > >>>>> } st;
> > >>>>>
> > >>>>>
> > >>>>> if ((st.buf = static_cast<char*>(os::malloc(buf_size,
> mtInternal))) == NULL) {
> > >>>>> return st.as_error(ENOMEM);
> > >>>>> }
> > >>>>>
> > >>>>> ....
> > >>>>> return st.result;
> > >>>>>
> > >>>>> [2] Will it be possible to break the functon down into
> smaller pieces, so it's easier to understand what's going on?
> > >>>>>
> > >>>>> E.g., Maybe the code that converts the C string into a
> unicode string can be moved into a separate function?
> > >>>>>
> > >>>>> // Get required buffer size to convert to Unicode
> > >>>>> int unicode_path_len = MultiByteToWideChar(CP_THREAD_ACP,
> > >>>>> MB_ERR_INVALID_CHARS,
> > >>>>> buf, -1,
> > >>>>> NULL, 0);
> > >>>>> if (unicode_path_len == 0) {
> > >>>>> os::free(buf);
> > >>>>> err = EINVAL;
> > >>>>> return NULL;
> > >>>>> }
> > >>>>>
> > >>>>> ....
> > >>>>>
> > >>>>> LPWSTR unicode_path =
> static_cast<LPWSTR>(os::malloc(sizeof(WCHAR) * unicode_path_len,
> mtInternal));
> > >>>>> if (unicode_path == NULL) {
> > >>>>> err = ENOMEM;
> > >>>>> } else {
> > >>>>> // This call would be success because it is checked in
> above
> > >>>>> err = ERROR_SUCCESS;
> > >>>>> MultiByteToWideChar(CP_THREAD_ACP,
> > >>>>> MB_ERR_INVALID_CHARS,
> > >>>>> buf, -1,
> > >>>>> unicode_path, unicode_path_len);
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> [3] size_t result_len = prefix_len - prefix_off +
> additional_space;
> > >>>>>
> > >>>>> The variable name result_len is confusing. It's not the
> length of the result.
> > >>>>>
> > >>>>> [4] The old copying code was already hard to understand, and
> the new version now has 2 versions of the copying code. I think the reason
> is you want to avoid an extra memcpy by calling GetFullPathNameW directly
> on the result.
> > >>>>>
> > >>>>> if (needs_fullpath) {
> > >>>>> // Get required buffer size to convert to full path
> > >>>>> DWORD full_path_len = GetFullPathNameW(unicode_path,
> 0, NULL, NULL);
> > >>>>> if (full_path_len == 0) {
> > >>>>> err = EINVAL;
> > >>>>> } else {
> > >>>>> size_t result_size = sizeof(WCHAR) * (result_len +
> full_path_len);
> > >>>>> result = static_cast<LPWSTR>(os::malloc(result_size,
> mtInternal));
> > >>>>> if (result == NULL) {
> > >>>>> err = ENOMEM;
> > >>>>> } else {
> > >>>>> // This call would be success because it is
> checked in above
> > >>>>> GetFullPathNameW(unicode_path, full_path_len,
> result + prefix_len - prefix_off, NULL);
> > >>>>> // Copy prefix
> > >>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
> > >>>>> }
> > >>>>> }
> > >>>>> } else {
> > >>>>> size_t result_size = sizeof(WCHAR) * (result_len +
> unicode_path_len);
> > >>>>> result = static_cast<LPWSTR>(os::malloc(result_size,
> mtInternal));
> > >>>>> if (result == NULL) {
> > >>>>> err = ENOMEM;
> > >>>>> } else {
> > >>>>> // Copy unicode path
> > >>>>> memcpy(result + prefix_len - prefix_off,
> unicode_path, sizeof(WCHAR) * unicode_path_len);
> > >>>>> // Copy prefix
> > >>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
> > >>>>> }
> > >>>>> }
> > >>>>>
> > >>>>> I think the code can be simplified if you do the fullpath
> version in a separate function:
> > >>>>>
> > >>>>> if (needs_fullpath) {
> > >>>>> unicode_path = to_full_path(unicode_path, err,
> &unicode_path_len);
> > >>>>> }
> > >>>>>
> > >>>>> size_t result_size = ......
> > >>>>>
> > >>>>> This will do one more memcpy, but I think it's not
> performance critical
> > >>>>>
> > >>>>> Thanks
> > >>>>> - Ioi
> > >>>>>
> > >>>>>
> > >>>>> On 3/3/20 4:19 PM, Yasumasa Suenaga wrote:
> > >>>>>> Thanks Ralf!
> > >>>>>> I fixed them.
> > >>>>>>
> > >>>>>> I need a Reviewer to push, so I uploaded new webrev:
> > >>>>>>
> > >>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.03/
> > >>>>>>
> > >>>>>>
> > >>>>>> Yasumasa
> > >>>>>>
> > >>>>>>
> > >>>>>> On 2020/03/03 23:25, Schmelter, Ralf wrote:
> > >>>>>>> Hi Yasumasa,
> > >>>>>>>
> > >>>>>>> thanks for your work. The code should now be able to handle
> any path.
> > >>>>>>>
> > >>>>>>> Two small changes (no new webrev needed):
> > >>>>>>>
> > >>>>>>> size_t result_len = prefix_len + prefix_off +
> additional_space
> > >>>>>>> should be
> > >>>>>>> size_t result_len = prefix_len - prefix_off +
> additional_space
> > >>>>>>>
> > >>>>>>> And you should copy the prefix after you copied the
> filename, since sometimes the prefix is required to overwrite the start of
> the filename:
> > >>>>>>>
> > >>>>>>> // This call would be success because it is checked in above
> > >>>>>>> GetFullPathNameW(unicode_path, full_path_len, result +
> prefix_len - prefix_off, NULL);
> > >>>>>>> // Copy prefix
> > >>>>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
> > >>>>>>>
> > >>>>>>> and
> > >>>>>>>
> > >>>>>>> // Copy unicode path
> > >>>>>>> memcpy(result + prefix_len - prefix_off, unicode_path,
> sizeof(WCHAR) * unicode_path_len);
> > >>>>>>> // Copy prefix
> > >>>>>>> memcpy(result, prefix, sizeof(WCHAR) * prefix_len);
> > >>>>>>>
> > >>>>>>> With these changes the os_windows gtest and the appcds
> jtreg tests run without errors.
> > >>>>>>>
> > >>>>>>> Best regards,
> > >>>>>>> Ralf
> > >>>>>>>
> > >>>>>
> > >>>
> > >
> >
>
More information about the hotspot-runtime-dev
mailing list