RFR: 8364317: Explicitly document some assumptions of StringUTF16 [v2]

Wed Jul 30 18:16:55 UTC 2025

On Wed, 30 Jul 2025 14:18:49 GMT, Chen Liang <liach at openjdk.org> wrote:

>> In #24773, people were concerned that the layout of a UTF16 byte array and a char array may be incompatible. In fact, they are - they are asserted in a corner in `LibraryCallKit::inline_string_char_access` in `library_call.cpp`.
>> 
>> In addition, another frequent error I see is that contributors have confused the meaning of indices in StringUTF16 - the indices are always in char array indices. I think we should make these explicit to help future maintenance.
>
> Chen Liang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add paragraph for endianness and layout

> All indices and sizes for byte arrays carrying UTF-16 data are in number of
> `char`s instead of number of bytes.

@liach, given the relatively big API surface of `j.l.StringUTF16`, are we certain about this?

src/java.base/share/classes/java/lang/StringUTF16.java line 51:

> 49: ///
> 50: /// All indices and sizes for byte arrays carrying UTF16 data are in number of
> 51: /// chars instead of  number of bytes.

Nit on cosmetics:

Suggestion:

/// UTF-16 `String` operations.
///
/// UTF-16 byte arrays have the identical layout as `char` arrays. They share the
/// same base offset and scale, and for each two-byte unit interpreted as a `char`,
/// it has the same endianness as a `char`, which is the platform endianness.
/// This is ensured in the static initializer of [StringUTF16].
///
/// All indices and sizes for byte arrays carrying UTF-16 data are in number of
/// `char`s instead of number of bytes.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26541#issuecomment-3137375297
PR Review Comment: https://git.openjdk.org/jdk/pull/26541#discussion_r2243513617