RFR: 8333893: Optimization for StringBuilder append boolean & null

Mon Jun 10 14:35:13 UTC 2024

On Mon, 10 Jun 2024 13:04:08 GMT, Shaojin Wen <duke at openjdk.org> wrote:

>> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values into larger stores.
>> 
>> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2.
>
> # 1. Compare with unsafe branch
> 
> 1. current (`5e815`) https://github.com/wenshao/jdk/tree/optim_str_builder_append_202406
> 2. unsafe (`adc220`) https://github.com/wenshao/jdk/tree/optim_str_builder_append_202406_unsafe
> 
> I think the performance of the Unsafe branch may be the best data for the C2 optimizer. @eme64 can help me see if C2 can do it?
> 
> # 2. Benchmark Commands
> 
> make test TEST="micro:java.lang.StringBuilders.toStringCharWithBool8"
> make test TEST="micro:java.lang.StringBuilders.toStringCharWithNull8"
> 
> 
> # 3. Implementation of Unsafe Branch
> 
> class AbstractStringBuilder {
>     static final Unsafe UNSAFE = Unsafe.getUnsafe();
> 
>     static final int NULL_LATIN1;
>     static final int TRUE_LATIN1;
>     static final int FALS_LATIN1;
> 
>     static final long NULL_UTF16;
>     static final long TRUE_UTF16;
>     static final long FALS_UTF16;
> 
>     static {
>         byte[] bytes4 = new byte[4];
>         byte[] bytes8 = new byte[8];
> 
>         bytes4[0] = 'n';
>         bytes4[1] = 'u';
>         bytes4[2] = 'l';
>         bytes4[3] = 'l';
>         NULL_LATIN1 = UNSAFE.getInt(bytes4, Unsafe.ARRAY_BYTE_BASE_OFFSET);
>         StringUTF16.inflate(bytes4, 0, bytes8, 0, 4);
>         NULL_UTF16 = UNSAFE.getLong(bytes8, Unsafe.ARRAY_BYTE_BASE_OFFSET);
> 
>         bytes4[0] = 't';
>         bytes4[1] = 'r';
>         bytes4[2] = 'u';
>         bytes4[3] = 'e';
>         TRUE_LATIN1 = UNSAFE.getInt(bytes4, Unsafe.ARRAY_BYTE_BASE_OFFSET);
>         StringUTF16.inflate(bytes4, 0, bytes8, 0, 4);
>         TRUE_UTF16 = UNSAFE.getLong(bytes8, Unsafe.ARRAY_BYTE_BASE_OFFSET);
> 
>         bytes4[0] = 'f';
>         bytes4[1] = 'a';
>         bytes4[2] = 'l';
>         bytes4[3] = 's';
>         FALS_LATIN1 = UNSAFE.getInt(bytes4, Unsafe.ARRAY_BYTE_BASE_OFFSET);
>         StringUTF16.inflate(bytes4, 0, bytes8, 0, 4);
>         FALS_UTF16 = UNSAFE.getLong(bytes8, Unsafe.ARRAY_BYTE_BASE_OFFSET);
>     }
> 
>     private AbstractStringBuilder appendNull() {
>         ensureCapacityInternal(count + 4);
>         int count = this.count;
>         byte[] val = this.value;
>         if (isLatin1()) {
>             UNSAFE.putInt(
>                     val,
>                     Unsafe.ARRAY_BYTE_BASE_OFFSET + count,
>                     NULL_LATIN1);
>         } else {
>             UNSAFE.putLong(
>                     val,
>                     Unsafe.ARRAY_BYTE_BASE_OFFSET + (count << 1),
>                     NULL_UTF16);
>         }
>         this.count = count + 4;
>         return this;
>     }
> 
>     public AbstractStringBuilder append(boolean b) {
>         int count = th...

@wenshao 
> I think the performance of the Unsafe branch may be the best data for the C2 optimizer. @eme64 can help me see if C2 can do it?

Have you tried to see if the optimization actually was done/taken? You can use the `TraceMergeStores,` flag. Can you present the generated assembly code of the benchmarks, and explain the difference based on the generated assembly code? You can run JMH penchmarks with `perf`. These two blogs may help you:

http://psy-lob-saw.blogspot.com/2015/07/jmh-perfasm.html
https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_meet_jmh_prof_perfasm

@liach I don't think it makes a difference if it is `int` or `byte` constants. Or what exactly is the code change you are proposing?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2158533469