RFC 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths

Fri Aug 23 07:54:32 UTC 2024

Hi Thomas,

On 23/08/2024 3:59 pm, Thomas Stüfe wrote:
> Hi David,
> 
> had a read through the CSR.

Thanks for taking a look.

> ---
> 
> `In addition we tweak the wording of|GetStringUTFChars|so that it:
> ...
> b) references the new|GetStringUTFLengthAsLong|function instead of the 
> Deprecated|GetStringUTFLength`|
> |
> |
> |
> (b) refers to GetStringUTFRegion, or? GetStringUTFChars has no such 
> wording, nor a len argument

Oops thanks - fixed (two different functions tweaked - I misread the diff)

> ---
> 
> I was initially surprised that we return a fake length from 
> GetStringUTFLength upon overflow instead of a clear error indicator like 
> -1. Now folks will work with potentially truncated strings. Typically 
> those are documents stored in string form, and truncation errors are not 
> obvious. But probably there is no better way:
> 
> Returning 0 would be an option - it would cause clearer and more 
> immediate data errors (missing document contents). But 
> it can be confused with "have no data" which can be a valid state.
> Returning -1 is potentially dangerous and can lead to overflows.
> Returning MAX_INT is not much better than returning up to the last valid 
> encoding, we just get a weird character at the end of the document.

Yes all of these possibilities were evaluated when that change was made 
(not in public unfortunately as it was considered a security issue), and 
each has its pros and cons. We settled on what seemed the least terrible 
option - truncation to the length of a valid UTF8 sequence.

Thanks,
David
-----

> Cheers, Thomas
> |
> 
>