RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Wed Aug 28 03:49:21 UTC 2024

On Wed, 28 Aug 2024 01:24:43 GMT, David Holmes <dholmes at openjdk.org> wrote:

>>> If you try to accommodate arbitrary future use then every method in the VM would need to enforce every single precondition and invariant it expects "just in case" and that is not practical.
>> 
>> I'm basically arguing for Functional Testing here, or at least having some invariants the would allow functional testing.  It may seem impractical to retrofit existing code, but when we are changing the input from int to size_t, that seems like the perfect time to enforce the new invariants.  If we expect "len" to be <= INT_MAX instead of SIZE_MAX, something that is not obvious from its type, then why not check that with an assert or at least document it?
>
> Note that I do already document the assumptions here in the general comment in utf8.hpp:
> 
> There is an additional assumption/expectation that our UTF8 API's are never dealing with
> invalid UTF8, and more generally that all UTF8 sequences could form valid Strings.
> Consequently the Unicode length of a UTF8 sequence is assumed to always be representable
> by an int. 
> 
> the check_cast is then the assert that verifies that.

OK, that's good enough for me.  Thanks,

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1733935851