RFC 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths

Fri Aug 23 04:33:24 UTC 2024

I've had some internal feedback which has been incorporated in the CSR 
request:

https://bugs.openjdk.org/browse/JDK-8338709

Proposed name of the new function is now GetStringUTFLengthAsLong.

David

On 20/08/2024 8:34 am, David Holmes wrote:
>
> Broadening the audience to hotspot-dev as zero response on 
> hotspot-runtime-dev.
>
> David
>
> On 13/08/2024 4:12 pm, David Holmes wrote:
>>
>> Comment is sought on this proposed updated to the JNI Specification
>>
>> https://bugs.openjdk.org/browse/JDK-8328877
>>
>> The modified UTf-8 format used by the VM can lead to UTF-8 sequences 
>> that exceed the maximum value of an int, due to multi-byte encoding, 
>> but the JNI GetStringUTFLength returns a jsize, which is (perhaps 
>> incorrectly) a jint ie. an int. As a result the current 
>> implementation will return a truncated version of the length of the 
>> sequence. To address this we propose to do two things in the JNI spec:
>>
>> 1. We Deprecate GetStringUTFLength
>>
>> +### GetStringUTFLength (Deprecated)
>>   
>>   `jsize GetStringUTFLength(JNIEnv *env, jstring string);`
>>   
>>   Returns the length in bytes of the modified UTF-8 representation of a string.
>>   
>> +As the capacity of a `jsize` variable is not sufficient to hold the length of
>> +all possible modified UTF-8 string representations (due to multi-byte encodings)
>> +this function is deprecated in favor of [`GetLargeStringUTFLength()`](#getlargestringutflength).
>> +If the modified UTF-8 representation of `string` has a length that exceeds the capacity
>> +of a `jsize` variable, then the length as of the last character that could be fully
>> +encoded without exceeding that capacity, is returned.
>>
>> 2. We add a new function GetLargeStringUTFLength
>>
>> +### GetLargeStringUTFLength
>> +
>> +`jlong GetLargeStringUTFLength(JNIEnv *env, jstring string);`
>> +
>> +Returns the complete length in bytes of the modified UTF-8 representation of a string.
>>
>> In addition we tweak the wording of GetStringUTFChars so that it:
>>
>> a) refers to a byte sequence instead of a byte array (to avoid 
>> suggesting the returned sequence is limited by the capacity of a Java 
>> array); and
>>
>> b) references the new GetLargeStringUTFLength function instead of the 
>> Deprecated GetStringUTFLength
>>
>> Note that GetStringUTFRegion is still using an int length so can't be 
>> used to obtain a giant region, but we don't expect this to be a 
>> practical concern.
>>
>> The JNI version will also be bumped for this API addition.
>>
>> Thanks,
>> David
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240823/fe8696b5/attachment-0001.htm>