RFR: 8338257: UTF8 lengths should be size_t not int [v5]
David Holmes
dholmes at openjdk.org
Wed Aug 28 01:27:28 UTC 2024
On Tue, 27 Aug 2024 23:54:08 GMT, Dean Long <dlong at openjdk.org> wrote:
>> If you try to accommodate arbitrary future use then every method in the VM would need to enforce every single precondition and invariant it expects "just in case" and that is not practical. Code can and does take advantage of the expected calling context, which here limits lengths to int (and typically < 64K). The checked_cast serves to catch such misuses in my opinion.
>
>> If you try to accommodate arbitrary future use then every method in the VM would need to enforce every single precondition and invariant it expects "just in case" and that is not practical.
>
> I'm basically arguing for Functional Testing here, or at least having some invariants the would allow functional testing. It may seem impractical to retrofit existing code, but when we are changing the input from int to size_t, that seems like the perfect time to enforce the new invariants. If we expect "len" to be <= INT_MAX instead of SIZE_MAX, something that is not obvious from its type, then why not check that with an assert or at least document it?
Note that I do already document the assumptions here in the general comment in utf8.hpp:
There is an additional assumption/expectation that our UTF8 API's are never dealing with
invalid UTF8, and more generally that all UTF8 sequences could form valid Strings.
Consequently the Unicode length of a UTF8 sequence is assumed to always be representable
by an int.
the check_cast is then the assert that verifies that.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1733751739
More information about the hotspot-dev
mailing list