RFR: 8364317: Explicitly document some assumptions of StringUTF16 [v2]

Thu Jul 31 17:21:54 UTC 2025

On Wed, 30 Jul 2025 18:12:03 GMT, Volkan Yazici <vyazici at openjdk.org> wrote:

>> Chen Liang has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add paragraph for endianness and layout
>
> src/java.base/share/classes/java/lang/StringUTF16.java line 51:
> 
>> 49: ///
>> 50: /// All indices and sizes for byte arrays carrying UTF16 data are in number of
>> 51: /// chars instead of  number of bytes.
> 
> Nit on cosmetics:
> 
> Suggestion:
> 
> /// UTF-16 `String` operations.
> ///
> /// UTF-16 byte arrays have the identical layout as `char` arrays. They share the
> /// same base offset and scale, and for each two-byte unit interpreted as a `char`,
> /// it has the same endianness as a `char`, which is the platform endianness.
> /// This is ensured in the static initializer of [StringUTF16].
> ///
> /// All indices and sizes for byte arrays carrying UTF-16 data are in number of
> /// `char`s instead of number of bytes.

Unforutnately I don't think I will use your suggestion maybe besides that whitespace fix.

1. UTF16 is derived from the `String.UTF16`, so I don't think I will stylize that.
2. The number of chars is the number of characters. I double checked and seems `length` is the only one that returns the byte array length instead of using character count as unit. There are some LATIN1 accepting APIs, and those APIs also use number of characters, except in LATIN1 case the number is identical to the number of chars.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26541#discussion_r2245962980