RFR: 8355177: Speed up StringBuilder::append(char[]) via Unsafe::copyMemory [v8]

Roger Riggs rriggs at openjdk.org
Thu Jul 24 14:40:56 UTC 2025


On Thu, 24 Jul 2025 14:20:48 GMT, Chen Liang <liach at openjdk.org> wrote:

>> src/java.base/share/classes/java/lang/StringUTF16.java line 1490:
>> 
>>> 1488:                 val,
>>> 1489:                 Unsafe.ARRAY_BYTE_BASE_OFFSET + ((long) index << 1),
>>> 1490:                 (long) (end - off) << 1);
>> 
>> The documentation of `copyMemory()` is not super-clear about endianness.
>> But it seems to imply that in this case it behaves as if it were to copy `short`s, so endianness seems to be preserved.
>> 
>> The invocation of `copyMemory()` here implicitly assumes that `ARRAY_CHAR_INDEX_SCALE` and `ARRAY_BYTE_INDEX_SCALE` are 2 and 1, resp., which seems quite reasonable but not written in the stone.
>
> I recall runtime requires UTF16 byte array and char array have exactly the same layout - would be nice if we keep this in the design notes for the string implementation classes, such as on the class header.
> 
> (Useful notes could include that indices are char-based, UTF16 byte[] and char[] has identical layout, etc.)

The StringUTF16.getChar and putChar methods are carefully written to use the platform endianness to compose and decompose char values from and to byte[] in terms of shifts of the lower and upper bytes.
The mapping of that into other apis that try to optimize between char[] and the compact string byte[] are less well documented.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24773#discussion_r2228721098


More information about the core-libs-dev mailing list