RFR: 8355177: Speed up StringBuilder::append(char[]) via UTF16::compress & Unsafe::copyMemory [v4]

Shaojin Wen swen at openjdk.org
Fri May 2 06:43:47 UTC 2025


On Fri, 2 May 2025 03:49:39 GMT, Shaojin Wen <swen at openjdk.org> wrote:

>> In BufferedReader.readLine and other similar scenarios, we need to use StringBuilder.append(char[]) to build the string.
>> 
>> For these scenarios, we can use the intrinsic method StringUTF16.compress and Unsafe.copyMemory instead of the character copy of the char-by-char loop to improve the speed.
>
> Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:
> 
>  - Merge remote-tracking branch 'upstream/master' into optim_sb_append_chars_202504
>    
>    # Conflicts:
>    #	src/java.base/share/classes/java/lang/AbstractStringBuilder.java
>  - Merge remote-tracking branch 'upstream/master' into optim_sb_append_chars_202504
>    
>    # Conflicts:
>    #	src/java.base/share/classes/java/lang/StringUTF16.java
>  - putCharsUnchecked
>  - copyright
>  - Using StringUTF16.compress to speed up LATIN1 StringBuilder append(char[])
>  - Using Unsafe.copyMemory to speed up UTF16 StringBuilder append(char[])
>  - add append(char[]) benchmark

> > This might be helpful combined with #21730.
> 
> That implies creating a copy of the chars:
> 
> ```java
> private final void appendChars(CharSequence s, int off, int end) {
>     if (isLatin1()) {
>         byte[] val = this.value;
> 
>         // ----- Begin of Experimental Section -----
>         char[] ca = new char[end - off];
>         s.getChars(off, end, ca, 0);
>         int compressed = StringUTF16.compress(ca, 0, val, count, end - off);
>         count += compressed;
>         off += compressed;
>         // ----- End of Experimental Section -----
> 
>         for (int i = off, j = count; i < end; i++) {
>             char c = s.charAt(i);
>             if (StringLatin1.canEncode(c)) {
>                 val[j++] = (byte)c;
>             } else {
>                 count = j;
>                 inflate();
>                 // Store c to make sure sb has a UTF16 char
>                 StringUTF16.putCharSB(this.value, j++, c);
>                 count = j;
>                 i++;
>                 StringUTF16.putCharsSB(this.value, j, s, i, end);
>                 count += end - i;
>                 return;
>             }
>         }
>     } else {
>         StringUTF16.putCharsSB(this.value, count, s, off, end);
>     }
>     count += end - off;
> }
> ```
> 
> While I do _assume_ that it should faster to let machine code perform the copy and compression over letting Java code perform a char-by-char approach, to be sure there should be another benchmark to actually proof this claim.


>         char[] ca = new char[end - off];

Your code here has a memory allocation, which may cause slowdown

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24773#issuecomment-2846483320


More information about the core-libs-dev mailing list