RFR: 8333893: Optimization for StringBuilder append boolean & null
Emanuel Peter
epeter at openjdk.org
Tue Jun 11 06:30:13 UTC 2024
On Mon, 10 Jun 2024 23:10:05 GMT, Shaojin Wen <duke at openjdk.org> wrote:
>> @wenshao
>>> I think the performance of the Unsafe branch may be the best data for the C2 optimizer. @eme64 can help me see if C2 can do it?
>>
>> Have you tried to see if the optimization actually was done/taken? You can use the `TraceMergeStores,` flag. Can you present the generated assembly code of the benchmarks, and explain the difference based on the generated assembly code? You can run JMH penchmarks with `perf`. These two blogs may help you:
>>
>> http://psy-lob-saw.blogspot.com/2015/07/jmh-perfasm.html
>> https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_meet_jmh_prof_perfasm
>>
>> @liach I don't think it makes a difference if it is `int` or `byte` constants. Or what exactly is the code change you are proposing?
>
> @eme64 It seems that when the following code uses StringUTF16.putChar, C2's optimization is not as good as the manual merging and storage effect.
>
> class AbstractStringBuilder {
> private AbstractStringBuilder appendNull() {
> // ...
> StringUTF16.putCharsAt(val, count, 'n', 'u', 'l', 'l');
> // ...
> }
>
> public AbstractStringBuilder append(boolean b) {
> // ...
> StringUTF16.putCharsAt(val, count, 't', 'r', 'u', 'e');
> // ...
> StringUTF16.putCharsAt(val, count, 'f', 'a', 'l', 's', 'e');
> // ...
> }
> }
>
> class StringUTF16 {
> public static void putCharsAt(byte[] value, int i, char c1, char c2, char c3, char c4) {
> putChar(value, i , c1);
> putChar(value, i + 1, c2);
> putChar(value, i + 2, c3);
> putChar(value, i + 3, c4);
> }
>
> @IntrinsicCandidate
> // intrinsic performs no bounds checks
> static void putChar(byte[] val, int index, int c) {
> assert index >= 0 && index < length(val) : "Trusted caller missed bounds check";
> index <<= 1;
> val[index++] = (byte)(c >> HI_BYTE_SHIFT);
> val[index] = (byte)(c >> LO_BYTE_SHIFT);
> }
> }
>
>
> The code for manually merging storage is as follows, without using Unsafe:
>
> class AbstractStringBuilder {
> static final long NULL_UTF16;
> static final long TRUE_UTF16;
> static final long FALS_UTF16;
>
> static {
> byte[] bytes = new byte[8];
>
> StringUTF16.putCharsAt(bytes, 0, 'n', 'u', 'l', 'l');
> NULL_UTF16 = getLong(bytes, 0);
>
> StringUTF16.putCharsAt(bytes, 0, 't', 'r', 'u', 'e');
> TRUE_UTF16 = getLong(bytes, 0);
>
> StringUTF16.putCharsAt(bytes, 0, 'f', 'a', 'l', 's');
> FALS_UTF16 = getLong(bytes, 0);
> }
>
> private static long getLong(byte[] bytes, int offset) {
> return (((long)bytes[offset ] & 0xff) ) |
> (((long)bytes[offset + 1] & 0xff) << 8) |
> (((long)bytes[offset + 2] & 0xff) << 16) |
> (((long)bytes[offset + 3] & 0xff) << 24) |
> (((long)bytes[offset + 4] & 0xff) << 32) |
> (((long)bytes[offset + 5] & 0xff) << 40) |
> (((long)bytes[offset + 6] & 0xff) << 48) |
> (((long)bytes[offset + 7] & 0xff) << 56);
> }
>
> private static void setLong(byte[] array, int offset, long value) {
> array[offset] = (byte) value;
> array[offset + 1] = (byte) (value >> 8);
> array[offset + 2] = (byte) (value >> 16);
> array[offset + 3] = (byte) (value >> 24);
> arra...
@wenshao
> @eme64 It seems that when the following code uses StringUTF16.putChar, C2's optimization is not as good as the manual merging and storage effect.
As I asked above, you will need to provide some evidence / generated assembly / perf data, and logs from `TraceMergeStores`. I currently do not have time to produce these myself, and I think they would be crucial to determine where the missing performance has gone. See my earlier comment:
https://github.com/openjdk/jdk/pull/19626#issuecomment-2158533469
And please also try @cl4es advide here:
https://github.com/openjdk/jdk/pull/19626#issuecomment-2159509806
And sure, maybe you need some public API for setting multiple bytes at once, which the `MergeStores` optimization can optimize. I'm a C2 engineer, so I leave that up to the library folks ;)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2159895757
More information about the core-libs-dev
mailing list