RFR: 8338257: UTF8 lengths should be size_t not int [v7]
David Holmes
dholmes at openjdk.org
Thu Aug 29 12:40:24 UTC 2024
On Thu, 29 Aug 2024 08:38:41 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
> > > Many of these translations seem awkward, since they convert to size_t only to then convert back to int.
> >
> > Can you be more specific here please?
>
> Certainly. For example, this construct:
>
> ```
> size_t utf8_len = static_cast<size_t>(length);
> const char* base = UNICODE::as_utf8(position, utf8_len);
> Symbol* sym = SymbolTable::new_symbol(base, checked_cast<int>(utf8_len));
> ```
>
> We introduce `utf8_len` as a `size_t` synonym for len, but since it originates from an `int`, its length must be <= INT_MIN. We also assume it is >=0, but we don't check. We feed `utf8_len` into both `UNICODE::as_utf8` and `SymbolTable::new_symbol`. The former takes a size_t, but since we rely on `length` >= 0, we could just as well give it `length`. For `SymbolTable::new_symbol`, we translate `utf8_len` back to `int`, with a check. The check feels superfluous since `utf8_len` came from an int.
>
> I assume this verbosity is for the benefit of the code reader, to make intent clear. Otherwise, we could have just continued to use length, and just cast it on the fly to unsigned or to size_t when calling `UNICODE::as_utf8`.
This is exactly the case I was referring to. The declaration here is:
template<typename T> static char* as_utf8(const T* base, size_t& length);
whether length is an IN/OUT parameter that is the int array length going in (hence >= 0 and <= INT_MAX), and the size_t utf8 sequence length coming out. The out coming utf8 length can theoretically by > INT_MAX but if that were the case in this code (which expects to be dealing with names that can be symbols hence < 64K) then that would be a programming error which the checked_cast would catch. And of course new_symbol checks for < 64K.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2317534674
More information about the serviceability-dev
mailing list