RFR: 8355177: Speed up StringBuilder::append(char[]) via UTF16::compress & Unsafe::copyMemory [v4]
Shaojin Wen
swen at openjdk.org
Fri May 2 06:43:47 UTC 2025
On Fri, 2 May 2025 03:49:39 GMT, Shaojin Wen <swen at openjdk.org> wrote:
>> In BufferedReader.readLine and other similar scenarios, we need to use StringBuilder.append(char[]) to build the string.
>>
>> For these scenarios, we can use the intrinsic method StringUTF16.compress and Unsafe.copyMemory instead of the character copy of the char-by-char loop to improve the speed.
>
> Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:
>
> - Merge remote-tracking branch 'upstream/master' into optim_sb_append_chars_202504
>
> # Conflicts:
> # src/java.base/share/classes/java/lang/AbstractStringBuilder.java
> - Merge remote-tracking branch 'upstream/master' into optim_sb_append_chars_202504
>
> # Conflicts:
> # src/java.base/share/classes/java/lang/StringUTF16.java
> - putCharsUnchecked
> - copyright
> - Using StringUTF16.compress to speed up LATIN1 StringBuilder append(char[])
> - Using Unsafe.copyMemory to speed up UTF16 StringBuilder append(char[])
> - add append(char[]) benchmark
> > This might be helpful combined with #21730.
>
> That implies creating a copy of the chars:
>
> ```java
> private final void appendChars(CharSequence s, int off, int end) {
> if (isLatin1()) {
> byte[] val = this.value;
>
> // ----- Begin of Experimental Section -----
> char[] ca = new char[end - off];
> s.getChars(off, end, ca, 0);
> int compressed = StringUTF16.compress(ca, 0, val, count, end - off);
> count += compressed;
> off += compressed;
> // ----- End of Experimental Section -----
>
> for (int i = off, j = count; i < end; i++) {
> char c = s.charAt(i);
> if (StringLatin1.canEncode(c)) {
> val[j++] = (byte)c;
> } else {
> count = j;
> inflate();
> // Store c to make sure sb has a UTF16 char
> StringUTF16.putCharSB(this.value, j++, c);
> count = j;
> i++;
> StringUTF16.putCharsSB(this.value, j, s, i, end);
> count += end - i;
> return;
> }
> }
> } else {
> StringUTF16.putCharsSB(this.value, count, s, off, end);
> }
> count += end - off;
> }
> ```
>
> While I do _assume_ that it should faster to let machine code perform the copy and compression over letting Java code perform a char-by-char approach, to be sure there should be another benchmark to actually proof this claim.
> char[] ca = new char[end - off];
Your code here has a memory allocation, which may cause slowdown
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24773#issuecomment-2846483320
More information about the core-libs-dev
mailing list