Improvements to Java Native Interface API's in JDK 8
John Platts
john_platts at hotmail.com
Sat Jul 9 07:33:49 PDT 2011
I do agree that the problem with using Modified UTF-8 instead of Standard UTF-8, as described in bug #5030776, really needs to be addressed.
However, here is another problem with the Java Native Interface API's that is not described in bug #5030776. The problem is that the JNI invocation API currently expects arguments to the JNI_CreateJavaVM method to be encoded in the platform default encoding. There is a real need to add support for passing in arguments to the JNI_CreateJavaVM method using UTF-8, Modified UTF-8, or UTF-16 in Java SE 8 for the following reasons:
- Conversion from UTF-8, Modified UTF-8, or UTF-16 to the platform default encoding can replace characters that are not in the platform default encoding with the default replacement character. In some cases, this can be a potential security risk on platforms where the platform default encoding is a non-Unicode-based encoding (such as on Microsoft Windows).
- The NetBeans platform, Eclipse Equinox, and Eclipse RCP can launch a Java VM using the JNI Invocation API.
- Filenames can be passed into arguments passed into the JNI_CreateJavaVM method. The JNI_CreateJavaVM API cannot accommodate filenames containing characters that are not in the platform default encoding, but the java.io and java.nio API's can accommodate filenames containing characters that are not in the platform default encoding.
- The main method of Java applications, System.getProperty, System.getProperties, and System.getEnv methods all support UTF-16 strings with characters that are not in the platform default encoding.
- Windows NT-based operating systems, including Windows XP, Windows Vista, Windows 7, and Windows Server, can start processes with arguments, file names, or environment variables that contain Unicode characters that are not in the platform default encoding with the CreateProcessW and ShellExecuteW functions. On OpenJDK 7 and other Java SE implementations, whenever a process is started with ProcessBuilder.start or Runtime.exec methods on Windows-based platforms, processes are actually started using the CreateProcessW function.
- On Windows platforms, wide character literals are encoded in UTF-16. However, wide character literals are not necessarily UTF-16 encoded on non-Windows platforms.
- The C1x and C++0x standards provide support for UTF-8, UTF-16, and UTF-32, including the ability to define UTF-8, UTF-16, and UTF-32 string literals. The C1x standard also defines Unicode conversion functions in uchar.h. The jchar type maps to the char16_t type on C compilers supporting the C1x standard and C++ compilers supporting the C++0x standard, and maps to wchar_t and WCHAR on Windows platforms.
There is still a need to support allowing arguments and options to be passed into the JNI_CreateJavaVM method using the platform-default encoding for backwards compatibility and to support non-Windows platforms.
On Windows platforms, the NetBeans launcher and the executable files in the bin directory of the Java Runtime Environment and Java Development Kit need to be updated to use wmain or wWinMain, and to pass in arguments and options to a Java SE 8 or later VM using Unicode. Arguments and environment variables actually get converted from Unicode to the platform default encoding if main and WinMain are used instead of wmain and wWinMain, or if they are obtained using getenv, GetCommandLineA, GetEnvironmentStringsA, or GetEnvironmentVariableA functions instead of the _wgetenv, GetCommandLineW, GetEnvironmentStringsW, or GetEnvironmentVariableW functions.
Allowing options to be passed into the JNI_CreateJavaVM method using UTF-16 eliminates the need to convert options from UTF-16 to the platform default encoding on Windows platforms. These options do get converted into Java strings, and these strings have to get converted back into UTF-16 whenever options are passed into the JNI_CreateJavaVM using the platform default encoding. If these options get passed into the JNI_CreateJavaVM method using UTF-16, the need to convert these options from the platform default encoding to UTF-16 can be avoided.
----------------------------------------
> Date: Mon, 27 Jun 2011 17:00:44 -0600
> From: daniel.daugherty at oracle.com
> To: john_platts at hotmail.com; jdk8-dev at openjdk.java.net
> Subject: Re: Improvements to Java Native Interface API's in JDK 8
>
> This (old) bug seems related to this proposal:
>
> 5030776 4/5 UTF-8 strings support doc change
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5030776
>
> Dan
>
>
> On 6/23/2011 9:03 AM, John Platts wrote:
> > One of the issues with the Java Native Interface Invocation API is that the arguments passed into the JNI_CreateJavaVM method are in the default platform encoding. Here are the problems with this approach:
> > - On Windows, the default platform encoding is set to an non-Unicode charset.
> > - There are Unicode-only locales in Windows 2000 and later, and these locales use characters that are not in ASCII.
> > - File names on Windows NT-based operating systems can contain characters that are not in the default platform encoding.
> > - JVM arguments can contain characters that are not in the default platform encoding. The conversion of strings containing characters that are not in the default platform encoding might pose a security risk in certain circumstances.
> > - The JVM converts arguments passed into the JVM from the platform default encoding to UTF-16.
> >
> > There needs to be a mechanism that allows Unicode-encoded arguments to be passed into the JNI_CreateJavaVM method on Java SE 8 or later. This mechanism requires new versions of JavaVMInitArgs, and a UTF-16 version of JavaVMOption (which is used when the UTF-16 encoding is specified).
> >
> > Here are updated definitions of the Java Native Interface Invocation API in Java SE 8 to support passing in VM options in Unicode, although the definitions are still subject to change at this point:
> > #define JNI_VERSION_1_8 0x00010008
> >
> > #define JNI_ENCODING_DEFAULT 0
> > #define JNI_ENCODING_MODIFIED_UTF8 1
> > #define JNI_ENCODING_STANDARD_UTF8 2
> > #define JNI_ENCODING_UTF16 3
> >
> > typedef struct JavaVMOption8 {
> > char *optionString;
> > void *extraInfo;
> > } JavaVMOption8;
> >
> > typedef struct JavaVMOption8_UTF16 {
> > jchar *optionString;
> > void *extraInfo;
> > } JavaVMOption8;
> >
> > typedef struct JavaVMInitArgs8 {
> > jint version; /* must be set to JNI_VERSION_1_8 */
> >
> > /* optionCharEncoding must be set to one of the following values: */
> > /* JNI_ENCODING_DEFAULT - Platform default encoding */
> > /* JNI_ENCODING_MODIFIED_UTF8 - Modified UTF-8 encoding */
> > /* JNI_ENCODING_STANDARD_UTF8 - Standard UTF-8 encoding */
> > jint optionCharEncoding;
> >
> > jint nOptions;
> > /* The optionString value of each of the options is in the */
> > /* encoding specified in optionCharEncoding. */
> > JavaVMOption8 *options;
> >
> > jboolean ignoreUnrecognized;
> > } JavaVMInitArgs8;
> >
> > typedef struct JavaVMInitArgs8_UTF16 {
> > jint version; /* must be set to JNI_VERSION_1_8 */
> > jint optionCharEncoding; /* must be set to JNI_ENCODING_UTF16 */
> > jint nOptions;
> > JavaVMOption8_UTF16 *options;
> > jboolean ignoreUnrecognized;
> > } JavaVMInitArgs8;
> >
> > Here are advantages of the new definitions:
> > - The JVM can verify that Modified UTF-8, Standard UTF-8, and UTF-16 input is not malformed.
> > - The programmer must specify the encoding used for the options passed into the VM. This improves correctness, improves portability, minimizes security risks, and makes review of code using the JNI Invocation API easier.
> > - JVM options containing characters that are not in the platform default encoding can be passed into the JNI invocation API, as long as the options contain valid Unicode characters.
> > - There is no longer a need to convert from UTF-16 strings to the platform specific encoding on Windows platforms. This makes writing code using the JNI Invocation API easier on Windows platform, since there is no longer a need to use WideCharToMultiByte to convert UTF-16-encoded options to the default platform encoding.
> > - The NetBeans and Eclipse launchers can start the Java VM using the JNI invocation API. The updates above can solve problems with the NetBeans and Eclipse launchers on Windows platforms, as the updates allow VM options to be passed in using Unicode instead of the default platform encoding.
> >
> > The executable files in the bin directory of JDK 8 and later need to be Unicode-enabled on Windows platforms. In addition, the NetBeans launcher needs to be Unicode-enabled on Windows platforms, and pass in options using Unicode whenever a Java SE 8 or later VM is launched through the NetBeans launcher.
> >
> > The Java Native Interface API's use Modified UTF-8 encoding instead of Standard UTF-8. There are several issues with having strings encoded as Modified UTF-8:
> > - These strings are often incorrectly treated as Standard UTF-8 strings or strings encoded in the default platform encoding.
> > - Many native APIs (with the exception of the Java Native Interface API's) expect strings to be in the default platform encoding, standard UTF-8, or UTF-16.
> > - Many JNI native libraries have bugs because they incorrectly treat modified UTF-8 strings as standard UTF-8 strings or strings in the default platform encoding. Some of these libraries also incorrectly pass in standard UTF-8-encoded strings or strings encoded in the default platform encoding into JNI methods without converting these strings into modified UTF-8.
> >
> > New versions of the following JNI methods need to be added into JNI in Java SE 8, with an additional argument to specify the character encoding used:
> > - DefineClass
> > - FindClass
> > - ThrowNew
> > - FatalError
> > - GetFieldID
> > - GetMethodID
> > - GetStaticFieldID
> > - GetStaticMethodID
> > - NewStringUTF
> > - GetStringUTFLength
> > - GetStringUTFChars
> > - ReleaseStringUTFChars
> > - GetStringUTFRegion
> > - RegisterNatives
> > - AttachCurrentThread
> > - AttachCurrentThreadAsDaemon
> >
> > New versions of these API's are needed to address correctness issues with JNI code. The semantics of the existing versions of these methods need to remain unchanged to avoid breaking backwards compatibility.
> >
More information about the jdk8-dev
mailing list