possible problem with JNI GetStringUTFChars
Stuart Marks
stuart.marks at oracle.com
Mon Jan 28 22:10:07 UTC 2019
On 1/26/19 3:19 PM, David Holmes wrote:
> On 27/01/2019 3:08 am, Martin Buchholz wrote:
>> It's a pet peeve that the name GetStringUTFChars is deeply misleading -
>> there are many "UTF"s, and this encoding is meant for use with the JVM
>> only. The documentation should make it clearer that this is NOT the UTF-8
>> you might expect.
>
> It does!
>
> GetStringUTFChars
>
> const char * GetStringUTFChars(JNIEnv *env, jstring string, jboolean *isCopy);
>
> Returns a pointer to an array of bytes representing the string in modified UTF-8
> encoding.
This is pretty easy to miss, especially if you're not aware that the JVM and the
JDK have this special concept of "modified UTF-8". Perhaps emphasis should be
added. Or maybe occurrences of "modified UTF-8" should be changed to be links to
the section in chapter 3 of the JNI spec where "modified UTF-8" is defined.
(Making the occurrences be links might be emphasis enough.)
I think it would be far too troublesome to try to migrate the JNI methods to
process real UTF-8 instead of modified UTF-8. That raises the question, though:
is there a use case for processing real UTF-8 within JNI? For example, for
interoperating with external components that expect real UTF-8. If so, perhaps
some conversion methods could be added.
(From Java code, the Charset encoders/decoders handle real UTF-8, which seems to
cover most cases. Modified UTF-8 occurs only within serialization and
Data{Input,Output}Stream.)
Alan Snyder wrote:
> -16 -97 -115 -69
I'll drink to that!
s'marks
More information about the core-libs-dev
mailing list