Add support for Unicode versions of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs on Windows platforms

David Holmes david.holmes at oracle.com
Mon May 8 23:50:57 UTC 2017


Hi John,

Responding back on the mailing lists. There are people on the mailing 
lists who are in a better position to evaluate the merits of the 
proposal. I searched the bug database and could not see this issue being 
raised in the past.

On 9/05/2017 8:46 AM, John Platts wrote:
> The real reasons to add UTF-16 versions of these APIs is the following:
>
> * The arguments passed into the wmain and wWinMain functions use
> UTF-16-encoded strings instead of UTF-8 strings
> * The arguments passed into the main and WinMain functions on
> Windows-platforms are in the ANSI character encoding instead of the
> UTF-8 character encoding
> * The NewString and GetStringChars APIs in the JNI already use
> UTF-16-encoded strings

Yes you are right the String functions already support UTF-16 as that is 
the format for char[] and so java.lang.String.

> * Unicode APIs on Windows normally use UTF-16-encoded strings
> * The C11 and C++11 standards support UTF-16 strings through the
> char16_t type and support for UTF-16 character literals with a u prefix

Thanks for the additional input.

David

>
> ------------------------------------------------------------------------
> *From:* David Holmes <david.holmes at oracle.com>
> *Sent:* Sunday, May 7, 2017 7:47 PM
> *To:* John Platts
> *Cc:* hotspot-dev developers; core-libs-dev Libs
> *Subject:* Re: Add support for Unicode versions of JNI_CreateJavaVM and
> JNI_GetDefaultJavaVMInitArgs on Windows platforms
>
> Added back jdk10-dev as a bcc.
>
> Added hotspot-dev and core-libs-dev (for launcher) for follow up
> discussions.
>
> Hi John,
>
> On 8/05/2017 10:33 AM, John Platts wrote:
>> I actually did a search through the code that implements
>> JNI_CreateJavaVM, and I found that the conversion of the strings is done
>> using java_lang_String::create_from_platform_dependent_str, which
>> converts from the platform-default encoding to Unicode. In the case of
>> Windows-based platforms, the conversion is done based on the ANSI
>> character encoding instead of UTF-8 or Modified UTF-8.
>>
>>
>> The platform encoding detection logic on Windows is implemented
>> java_props_md.c, which can be found at
>> jdk/src/windows/native/java/lang/java_props_md.c in releases prior to
>> JDK 9 and at src/java.base/windows/native/libjava/java_props_md.c in JDK
>> 9 and later. The encoding used for command-line arguments passed into
>> the JNI invocation API is Cp1252 for English locales on Windows
>> platforms, and not Modified UTF-8 or UTF-8.
>>
>>
>> The documentation found
>> at http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html
> also
> The Invocation API - Oracle
> <http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html>
> docs.oracle.com
> The Invocation API allows software vendors to load the Java VM into an
> arbitrary native application. Vendors can deliver Java-enabled
> applications without having to ...
>
>
>
>> states that the strings passed into JNI_CreateJavaVM are in the
>> platform-default encoding.
>
> Thanks for the additional details. I assume you are referring to:
>
> typedef struct JavaVMOption {
>      char *optionString;  /* the option as a string in the default
> platform encoding */
>
> that comment should not form part of the specification as it is
> non-normative text. If the intent is truly to use the platform default
> encoding and not UTF-8 then that should be very clearly spelt out in the
> spec!
>
> That said, the implementation is following this so it is a limitation. I
> suspect this is historical.
>
>> A version of JNI_CreateJavaVM that takes UTF-16-encoded strings should
>> be added to the JNI Invocation API. The java.exe launchers and javaw.exe
>> launchers should also be updated to use the UTF-16 version of the
>> JNI_CreateJavaVM function on Windows platforms and to use wmain and
>> wWinMain instead of main and WinMain.
>
> Why versions for UTF-16 instead of the missing UTF-8 variants? As I said
> the whole spec is intended to be based around UTF-8 so we would not want
> to throw in just a couple of UTF-16 based usages.
>
> Thanks,
> David
>
>>
>> A few files in HotSpot would need to be changed in order to implement
>> the UTF-16 version of JNI_CreateJavaVM, but the change would improve
>> consistency across different locales on Windows platforms and allow
>> arguments that contain Unicode characters that are not available in the
>> platform-default encoding to be passed into the JVM on the command line.
>>
>>
>> The UTF-16-based version of JNI_CreateJavaVM also makes it easier to
>> allocate string objects that contain non-ASCII characters as the strings
>> are already in UTF-16 format, at least in cases where the strings
>> contain Unicode characters that are not in Latin-1 or on VMs that do not
>> support compact Latin-1 strings.
>>
>>
>> The UTF-16-based version of JNI_CreateJavaVM should probably be
>> implemented as a separate function so that the solution could be
>> backported to JDK 8 and JDK 9 updates and so that backwards
>> compatibility with the current JNI_CreateJavaVM implementation is
>> maintained.
>>
>>
>> Here is what the new UTF-16-based API might look like:
>>
>> typedef struct JavaVMInitArgs_UTF16 {
>>     jint version;
>>     jint nOptions;
>>     JavaVMOptionUTF16 *options;
>>     jboolean ignoreUnrecognized;
>> } JavaVMInitArgs;
>>
>>
>> typedef struct JavaVMOption_UTF16 {
>>     char *optionString;  /* the option as a string in the default
>> platform encoding */
>>     void *extraInfo;
>> } JavaVMOptionUTF16;
>>
>> /* vm_args is an pointer to a JavaVMInitArgs_UTF16 structure */
>>
>> jint JNI_CreateJavaVM_UTF16(JavaVM **p_vm, void **p_env, void *vm_args);
>>
>>
>> /* vm_args is a pointer to a JavaVMInitArgs_UTF16 structure */
>>
>> jint JNI_GetDefaultJavaVMInitArgs_UTF16(void *vm_args);
>>
>> ------------------------------------------------------------------------
>> *From:* David Holmes <david.holmes at oracle.com>
>> *Sent:* Thursday, May 4, 2017 11:07 PM
>> *To:* John Platts; jdk10-dev at openjdk.java.net
>> *Subject:* Re: Add support for Unicode versions of JNI_CreateJavaVM and
>> JNI_GetDefaultJavaVMInitArgs on Windows platforms
>>
>> Hi John,
>>
>> The JNI is defined to use Modified UTF-8 format for strings, so any
>> Unicode character should be handled if passed in in the right format.
>> Updating the JNI specification and implementation to accept UTF-16
>> directly would be a major undertaking.
>>
>> Is the issue here that you want a tool, like the java launcher, to
>> accept arbitrary Unicode strings in a end-user friendly manner and then
>> have it perform the modified UTF-8 conversion when invoking the VM?
>>
>> Can you give a concrete example of what you would like to be able to
>> pass as arguments to the JVM?
>>
>> Thanks,
>> David
>>
>> On 5/05/2017 1:04 PM, John Platts wrote:
>>> The JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods in the JNI invocation API expect ANSI strings on Windows platforms instead of Unicode-encoded strings. This is an issue on Windows-based platforms since some of the option strings that are passed into JNI_CreateJavaVM might contain Unicode characters that are not in
>> the ANSI encoding on Windows platforms.
>>>
>>>
>>> There is support for UTF-16 literals on Windows platforms with wchar_t and wide character literals prefixed with the L prefix, and on platforms that support C11 and C++11 with char16_t and UTF-16 character literals that are prefixed with the u prefix.
>>>
>>>
>>> jchar is currently defined to be a typedef for unsigned short on all platforms, but char16_t is a separate type and not a typedef for unsigned short or jchar in C++11 and later. jchar should be changed to be a typedef for wchar_t on Windows platforms and to be a typedef for char16_t on non-Windows platforms that support the
>> char16_t type. This change will make it possible to define jchar
>> character and string literals on Windows platforms and on non-Windows
>> platforms that support the C11 or C++11 standard.
>>>
>>>
>>> The JCHAR_LITERAL macro should be added to the JNI header and defined as follows on Windows:
>>>
>>> #define JCHAR_LITERAL(x) L ## x
>>>
>>>
>>> The JCHAR_LITERAL macro should be added to the JNI header and defined as follows on non-Windows platforms:
>>>
>>> #define JCHAR_LITERAL(x) u ## x
>>>
>>>
>>> Here is how the Unicode version of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs could be defined:
>>>
>>> typedef struct JavaVMUnicodeOption {
>>>     const jchar *optionString;  /* the option as a string in UTF-16 encoding */
>>>     void *extraInfo;
>>> } JavaVMUnicodeOption;
>>>
>>> typedef struct JavaVMUnicodeInitArgs {
>>>     jint version;
>>>     jint nOptions;
>>>     JavaVMUnicodeOption *options;
>>>     jboolean ignoreUnrecognized;
>>> } JavaVMUnicodeInitArgs;
>>>
>>> jint JNI_CreateJavaVMUnicode(JavaVM **pvm, void **penv, void *args);
>>> jint JNI_GetDefaultJavaVMInitArgs(void *args);
>>>
>>> The java.exe wrapper should use wmain instead of main on Windows platforms, and the javaw.exe wrapper should use wWinMain instead of WinMain on Windows platforms. This change, along with the support for Unicode-enabled version of the JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs methods, would allow the JVM to be
>> launched with arguments that contain Unicode characters that are not in
>> the platform-default encoding.
>>>
>>> All of the Windows platforms that Java SE 10 and later VMs would be supported on do support Unicode. Adding support for Unicode versions of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs will allow Unicode characters that are not in the platform-default encoding on Windows platforms to be supported in command-line arguments
>> that are passed to the JVM.
>>>


More information about the core-libs-dev mailing list