RFC 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths

David Holmes david.holmes at oracle.com
Mon Aug 19 22:34:41 UTC 2024


Broadening the audience to hotspot-dev as zero response on 
hotspot-runtime-dev.

David

On 13/08/2024 4:12 pm, David Holmes wrote:
>
> Comment is sought on this proposed updated to the JNI Specification
>
> https://bugs.openjdk.org/browse/JDK-8328877
>
> The modified UTf-8 format used by the VM can lead to UTF-8 sequences 
> that exceed the maximum value of an int, due to multi-byte encoding, 
> but the JNI GetStringUTFLength returns a jsize, which is (perhaps 
> incorrectly) a jint ie. an int. As a result the current implementation 
> will return a truncated version of the length of the sequence. To 
> address this we propose to do two things in the JNI spec:
>
> 1. We Deprecate GetStringUTFLength
>
> +### GetStringUTFLength (Deprecated)
>   
>   `jsize GetStringUTFLength(JNIEnv *env, jstring string);`
>   
>   Returns the length in bytes of the modified UTF-8 representation of a string.
>   
> +As the capacity of a `jsize` variable is not sufficient to hold the length of
> +all possible modified UTF-8 string representations (due to multi-byte encodings)
> +this function is deprecated in favor of [`GetLargeStringUTFLength()`](#getlargestringutflength).
> +If the modified UTF-8 representation of `string` has a length that exceeds the capacity
> +of a `jsize` variable, then the length as of the last character that could be fully
> +encoded without exceeding that capacity, is returned.
>
> 2. We add a new function GetLargeStringUTFLength
>
> +### GetLargeStringUTFLength
> +
> +`jlong GetLargeStringUTFLength(JNIEnv *env, jstring string);`
> +
> +Returns the complete length in bytes of the modified UTF-8 representation of a string.
>
> In addition we tweak the wording of GetStringUTFChars so that it:
>
> a) refers to a byte sequence instead of a byte array (to avoid 
> suggesting the returned sequence is limited by the capacity of a Java 
> array); and
>
> b) references the new GetLargeStringUTFLength function instead of the 
> Deprecated GetStringUTFLength
>
> Note that GetStringUTFRegion is still using an int length so can't be 
> used to obtain a giant region, but we don't expect this to be a 
> practical concern.
>
> The JNI version will also be bumped for this API addition.
>
> Thanks,
> David
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240820/ee382aa2/attachment-0001.htm>


More information about the hotspot-dev mailing list