Improvements to Java Native Interface API's in JDK 8

John Platts john_platts at hotmail.com
Thu Jun 23 08:03:31 PDT 2011


One of the issues with the Java Native Interface Invocation API is that the arguments passed into the JNI_CreateJavaVM method are in the default platform encoding. Here are the problems with this approach:
- On Windows, the default platform encoding is set to an non-Unicode charset.
- There are Unicode-only locales in Windows 2000 and later, and these locales use characters that are not in ASCII.
- File names on Windows NT-based operating systems can contain characters that are not in the default platform encoding.
- JVM arguments can contain characters that are not in the default platform encoding. The conversion of strings containing characters that are not in the default platform encoding might pose a security risk in certain circumstances.
- The JVM converts arguments passed into the JVM from the platform default encoding to UTF-16.
 
There needs to be a mechanism that allows Unicode-encoded arguments to be passed into the JNI_CreateJavaVM method on Java SE 8 or later. This mechanism requires new versions of JavaVMInitArgs, and a UTF-16 version of JavaVMOption (which is used when the UTF-16 encoding is specified).
 
Here are updated definitions of the Java Native Interface Invocation API in Java SE 8 to support passing in VM options in Unicode, although the definitions are still subject to change at this point:
#define JNI_VERSION_1_8 0x00010008
 
#define JNI_ENCODING_DEFAULT 0
#define JNI_ENCODING_MODIFIED_UTF8 1
#define JNI_ENCODING_STANDARD_UTF8 2
#define JNI_ENCODING_UTF16 3
 
typedef struct JavaVMOption8 {
    char *optionString;
    void *extraInfo;
} JavaVMOption8;
 
typedef struct JavaVMOption8_UTF16 {
    jchar *optionString;
    void *extraInfo;
} JavaVMOption8;
 
typedef struct JavaVMInitArgs8 {
    jint version; /* must be set to JNI_VERSION_1_8 */
 
    /* optionCharEncoding must be set to one of the following values: */
    /* JNI_ENCODING_DEFAULT - Platform default encoding */
    /* JNI_ENCODING_MODIFIED_UTF8 - Modified UTF-8 encoding */
    /* JNI_ENCODING_STANDARD_UTF8 - Standard UTF-8 encoding */
    jint optionCharEncoding;
 
    jint nOptions;
    /* The optionString value of each of the options is in the */
    /* encoding specified in optionCharEncoding.               */
    JavaVMOption8 *options;
 
    jboolean ignoreUnrecognized;
} JavaVMInitArgs8;
 
typedef struct JavaVMInitArgs8_UTF16 {
    jint version; /* must be set to JNI_VERSION_1_8 */
    jint optionCharEncoding; /* must be set to JNI_ENCODING_UTF16 */
    jint nOptions;
    JavaVMOption8_UTF16 *options;
    jboolean ignoreUnrecognized;
} JavaVMInitArgs8;
 
Here are advantages of the new definitions:
- The JVM can verify that Modified UTF-8, Standard UTF-8, and UTF-16 input is not malformed.
- The programmer must specify the encoding used for the options passed into the VM. This improves correctness, improves portability, minimizes security risks, and makes review of code using the JNI Invocation API easier.
- JVM options containing characters that are not in the platform default encoding can be passed into the JNI invocation API, as long as the options contain valid Unicode characters.
- There is no longer a need to convert from UTF-16 strings to the platform specific encoding on Windows platforms. This makes writing code using the JNI Invocation API easier on Windows platform, since there is no longer a need to use WideCharToMultiByte to convert UTF-16-encoded options to the default platform encoding.
- The NetBeans and Eclipse launchers can start the Java VM using the JNI invocation API. The updates above can solve problems with the NetBeans and Eclipse launchers on Windows platforms, as the updates allow VM options to be passed in using Unicode instead of the default platform encoding.
 
The executable files in the bin directory of JDK 8 and later need to be Unicode-enabled on Windows platforms. In addition, the NetBeans launcher needs to be Unicode-enabled on Windows platforms, and pass in options using Unicode whenever a Java SE 8 or later VM is launched through the NetBeans launcher.
 
The Java Native Interface API's use Modified UTF-8 encoding instead of Standard UTF-8. There are several issues with having strings encoded as Modified UTF-8:
- These strings are often incorrectly treated as Standard UTF-8 strings or strings encoded in the default platform encoding.
- Many native APIs (with the exception of the Java Native Interface API's) expect strings to be in the default platform encoding, standard UTF-8, or UTF-16.
- Many JNI native libraries have bugs because they incorrectly treat modified UTF-8 strings as standard UTF-8 strings or strings in the default platform encoding. Some of these libraries also incorrectly pass in standard UTF-8-encoded strings or strings encoded in the default platform encoding into JNI methods without converting these strings into modified UTF-8.
 
New versions of the following JNI methods need to be added into JNI in Java SE 8, with an additional argument to specify the character encoding used:
- DefineClass
- FindClass
- ThrowNew
- FatalError
- GetFieldID
- GetMethodID
- GetStaticFieldID
- GetStaticMethodID
- NewStringUTF
- GetStringUTFLength
- GetStringUTFChars
- ReleaseStringUTFChars
- GetStringUTFRegion
- RegisterNatives
- AttachCurrentThread
- AttachCurrentThreadAsDaemon

New versions of these API's are needed to address correctness issues with JNI code. The semantics of the existing versions of these methods need to remain unchanged to avoid breaking backwards compatibility.
 		 	   		  


More information about the jdk8-dev mailing list