RFC 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths
David Holmes
david.holmes at oracle.com
Fri Aug 23 07:54:32 UTC 2024
Hi Thomas,
On 23/08/2024 3:59 pm, Thomas Stüfe wrote:
> Hi David,
>
> had a read through the CSR.
Thanks for taking a look.
> ---
>
> `In addition we tweak the wording of|GetStringUTFChars|so that it:
> ...
> b) references the new|GetStringUTFLengthAsLong|function instead of the
> Deprecated|GetStringUTFLength`|
> |
> |
> |
> (b) refers to GetStringUTFRegion, or? GetStringUTFChars has no such
> wording, nor a len argument
Oops thanks - fixed (two different functions tweaked - I misread the diff)
> ---
>
> I was initially surprised that we return a fake length from
> GetStringUTFLength upon overflow instead of a clear error indicator like
> -1. Now folks will work with potentially truncated strings. Typically
> those are documents stored in string form, and truncation errors are not
> obvious. But probably there is no better way:
>
> Returning 0 would be an option - it would cause clearer and more
> immediate data errors (missing document contents). But
> it can be confused with "have no data" which can be a valid state.
> Returning -1 is potentially dangerous and can lead to overflows.
> Returning MAX_INT is not much better than returning up to the last valid
> encoding, we just get a weird character at the end of the document.
Yes all of these possibilities were evaluated when that change was made
(not in public unfortunately as it was considered a security issue), and
each has its pros and cons. We settled on what seemed the least terrible
option - truncation to the length of a valid UTF8 sequence.
Thanks,
David
-----
> Cheers, Thomas
> |
>
>
More information about the hotspot-dev
mailing list