RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Tue Aug 27 07:23:04 UTC 2024

On Tue, 27 Aug 2024 03:13:59 GMT, Dean Long <dlong at openjdk.org> wrote:

>> David Holmes has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   more missing casts
>
> src/hotspot/share/classfile/javaClasses.cpp line 588:
> 
>> 586:     size_t utf8_len = static_cast<size_t>(length);
>> 587:     const char* base = UNICODE::as_utf8(position, utf8_len);
>> 588:     Symbol* sym = SymbolTable::new_symbol(base, checked_cast<int>(utf8_len));
> 
> With the current limitations of checked_cast(), we would also need to check if the result is negative on 32-bit platforms, because then size_t and int will be the same size, and checked_cast will never complain.

I'm trying to reason if on 32-bit we could even create a large enough string for this to be a problem? Once we have the giant string `as_utf8` will have to allocate an array that is just as large if not larger. So for overflow to be an issue we need a string of length INT_MAX - which is limited to 2GB and then we have to allocate a resource array of 2GB as well. So we need to have allocated 4GB which is our entire address space on 32-bit. So I don't think we can ever hit a problem on 32-bit where the size_t utf8 length would convert to a negative int.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732281358