RFR: 8364317: Explicitly document some assumptions of StringUTF16 [v2]

Thu Jul 31 17:46:54 UTC 2025

On Wed, 30 Jul 2025 14:18:49 GMT, Chen Liang <liach at openjdk.org> wrote:

>> In #24773, people were concerned that the layout of a UTF16 byte array and a char array may be incompatible. In fact, they are - they are asserted in a corner in `LibraryCallKit::inline_string_char_access` in `library_call.cpp`.
>> 
>> In addition, another frequent error I see is that contributors have confused the meaning of indices in StringUTF16 - the indices are always in char array indices. I think we should make these explicit to help future maintenance.
>
> Chen Liang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add paragraph for endianness and layout

> > @liach, given the relatively big API surface of j.l.StringUTF16, are we certain about this?
> 
> I am. In fact, your update to use `char` made it less correct - there are a few APIs that take LATIN1 byte arrays, for which number of chars is equivalent to number of bytes.

Thanks for taking care of these details Chen, much appreciated. Changes LGTM.

-------------

Marked as reviewed by vyazici (Committer).

PR Review: https://git.openjdk.org/jdk/pull/26541#pullrequestreview-3076373698