RFR: 8333893: Optimization for StringBuilder append boolean & null
Emanuel Peter
epeter at openjdk.org
Mon Jun 10 14:35:13 UTC 2024
On Mon, 10 Jun 2024 13:04:08 GMT, Shaojin Wen <duke at openjdk.org> wrote:
>> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values into larger stores.
>>
>> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2.
>
> # 1. Compare with unsafe branch
>
> 1. current (`5e815`) https://github.com/wenshao/jdk/tree/optim_str_builder_append_202406
> 2. unsafe (`adc220`) https://github.com/wenshao/jdk/tree/optim_str_builder_append_202406_unsafe
>
> I think the performance of the Unsafe branch may be the best data for the C2 optimizer. @eme64 can help me see if C2 can do it?
>
> # 2. Benchmark Commands
>
> make test TEST="micro:java.lang.StringBuilders.toStringCharWithBool8"
> make test TEST="micro:java.lang.StringBuilders.toStringCharWithNull8"
>
>
> # 3. Implementation of Unsafe Branch
>
> class AbstractStringBuilder {
> static final Unsafe UNSAFE = Unsafe.getUnsafe();
>
> static final int NULL_LATIN1;
> static final int TRUE_LATIN1;
> static final int FALS_LATIN1;
>
> static final long NULL_UTF16;
> static final long TRUE_UTF16;
> static final long FALS_UTF16;
>
> static {
> byte[] bytes4 = new byte[4];
> byte[] bytes8 = new byte[8];
>
> bytes4[0] = 'n';
> bytes4[1] = 'u';
> bytes4[2] = 'l';
> bytes4[3] = 'l';
> NULL_LATIN1 = UNSAFE.getInt(bytes4, Unsafe.ARRAY_BYTE_BASE_OFFSET);
> StringUTF16.inflate(bytes4, 0, bytes8, 0, 4);
> NULL_UTF16 = UNSAFE.getLong(bytes8, Unsafe.ARRAY_BYTE_BASE_OFFSET);
>
> bytes4[0] = 't';
> bytes4[1] = 'r';
> bytes4[2] = 'u';
> bytes4[3] = 'e';
> TRUE_LATIN1 = UNSAFE.getInt(bytes4, Unsafe.ARRAY_BYTE_BASE_OFFSET);
> StringUTF16.inflate(bytes4, 0, bytes8, 0, 4);
> TRUE_UTF16 = UNSAFE.getLong(bytes8, Unsafe.ARRAY_BYTE_BASE_OFFSET);
>
> bytes4[0] = 'f';
> bytes4[1] = 'a';
> bytes4[2] = 'l';
> bytes4[3] = 's';
> FALS_LATIN1 = UNSAFE.getInt(bytes4, Unsafe.ARRAY_BYTE_BASE_OFFSET);
> StringUTF16.inflate(bytes4, 0, bytes8, 0, 4);
> FALS_UTF16 = UNSAFE.getLong(bytes8, Unsafe.ARRAY_BYTE_BASE_OFFSET);
> }
>
> private AbstractStringBuilder appendNull() {
> ensureCapacityInternal(count + 4);
> int count = this.count;
> byte[] val = this.value;
> if (isLatin1()) {
> UNSAFE.putInt(
> val,
> Unsafe.ARRAY_BYTE_BASE_OFFSET + count,
> NULL_LATIN1);
> } else {
> UNSAFE.putLong(
> val,
> Unsafe.ARRAY_BYTE_BASE_OFFSET + (count << 1),
> NULL_UTF16);
> }
> this.count = count + 4;
> return this;
> }
>
> public AbstractStringBuilder append(boolean b) {
> int count = th...
@wenshao
> I think the performance of the Unsafe branch may be the best data for the C2 optimizer. @eme64 can help me see if C2 can do it?
Have you tried to see if the optimization actually was done/taken? You can use the `TraceMergeStores,` flag. Can you present the generated assembly code of the benchmarks, and explain the difference based on the generated assembly code? You can run JMH penchmarks with `perf`. These two blogs may help you:
http://psy-lob-saw.blogspot.com/2015/07/jmh-perfasm.html
https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_meet_jmh_prof_perfasm
@liach I don't think it makes a difference if it is `int` or `byte` constants. Or what exactly is the code change you are proposing?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2158533469
More information about the core-libs-dev
mailing list