RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Tue Aug 27 13:09:04 UTC 2024

On Tue, 27 Aug 2024 12:20:04 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> I think the Java string would only need to be INT_MAX/3 in length, if all the characters require surrogate encoding.
>
> IIUC for compact strings, with non-latin-1 each pair of bytes would require at most 3-bytes to encode so you'd need 2/3 of INT_MAX. With latin-1 it would be 1/2 INT_MAX. But yes I suppose in theory you might be able to get an overflow on 32-bit.  Need to think more about what could even be done for this case ... and whether it is worth trying ...

SymbolTable does check the length and truncates with a warning (see https://github.com/openjdk/jdk/blob/0c332e9de919184d8a4678bfd7c274fcef02b3e2/src/hotspot/share/classfile/symbolTable.cpp#L351-L360) though it does not seem to check for values < 0. Maybe we should add that.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732816650