RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]
Claes Redestad
redestad at openjdk.org
Thu Jun 20 10:12:22 UTC 2024
On Thu, 13 Jun 2024 07:36:34 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Thanks to @eme64 's patience and help, I found a way to use MergeStore without doing boundary checking.
>>
>> It would be even better if Unsafe.putChar could be used for MergeStore in the future.
>>
>> ## 1. JavaCode
>>
>> class StringUTF16 {
>> public static void putCharsAt(byte[] value, int i, char c1, char c2, char c3, char c4) {
>> // Don't use the putChar method, Its instrinsic will cause C2 unable to combining values into larger stores.
>> putChar1(value, i , c1);
>> putChar1(value, i + 1, c2);
>> putChar1(value, i + 2, c3);
>> putChar1(value, i + 3, c4);
>> }
>>
>> public static void putCharsAt(byte[] value, int i, char c1, char c2, char c3, char c4, char c5) {
>> // Don't use the putChar method, Its instrinsic will cause C2 unable to combining values into larger stores.
>> putChar1(value, i , c1);
>> putChar1(value, i + 1, c2);
>> putChar1(value, i + 2, c3);
>> putChar1(value, i + 3, c4);
>> putChar1(value, i + 4, c5);
>> }
>>
>> static void putChar1(byte[] value, int i, char c) {
>> int address = Unsafe.ARRAY_BYTE_BASE_OFFSET + (i << 1);
>> Unsafe.getUnsafe().putByte(value, address , (byte)(c >> HI_BYTE_SHIFT));
>> Unsafe.getUnsafe().putByte(value, address + 1, (byte)(c >> LO_BYTE_SHIFT));
>> }
>> }
>>
>>
>> ## 2. Benchmark Numbers
>> The performance numbers under MacBookPro M1 Max are as follows:
>>
>> -Benchmark Mode Cnt Score Error Units
>> -StringBuilders.toStringCharWithNull8Latin1 avgt 15 7.073 ? 0.010 ns/op
>> -StringBuilders.toStringCharWithNull8Utf16 avgt 15 9.298 ? 0.015 ns/op
>> -StringBuilders.toStringCharWithBool8Latin1 avgt 15 7.387 ? 0.021 ns/op
>> -StringBuilders.toStringCharWithBool8Utf16 avgt 15 9.615 ? 0.011 ns/op
>>
>> +Benchmark Mode Cnt Score Error Units
>> +StringBuilders.toStringCharWithNull8Latin1 avgt 15 5.628 ? 0.017 ns/op +25.67%
>> +StringBuilders.toStringCharWithNull8Utf16 avgt 15 6.873 ? 0.026 ns/op +35.28%
>> +StringBuilders.toStringCharWithBool8Latin1 avgt 15 6.480 ? 0.077 ns/op +13.99%
>> +StringBuilders.toStringCharWithBool8Utf16 avgt 15 7.456 ? 0.059 ns/op +28.95%
>>
>>
>> ## 3. TraceMergeStores
>>
>> [TraceMergeStores]: Replace
>> 65 StoreB === 54 7 64 12 [[ 87 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=4; unsafe Memory: @byte[int:>=0] (java/la...
>
> @wenshao I'm glad we figured it out a bit, and I hope you learned a thing or two :)
>
>> It would be even better if Unsafe.putChar could be used for MergeStore in the future.
>
> The `_putCharStringU` intrinsic uses `LibraryCallKit::inline_string_char_access`. It seems to generate IR nodes, I speculate it could be `StoreC`. We'd have to do more digging to see why that pattern is not accepted by `MergeStore`.
>
> @wenshao I'm not sure if everything is right with the bounds checking, but I leave this to the library folks to review. What you will definately need is benchmarking not just on `M1`, but also `x86-64` and other `aarch64` architectures.
I'm not opposed to accepting this patch as-is, but I think we should do so with an eye towards reverting if we figure out a way to improve the `putChar` intrinsic so that it doesn't block merge store optimization. What do you think @eme64?
As the need for the changes in [b5ad8e7](https://github.com/openjdk/jdk/pull/19626/commits/b5ad8e70928c547d134d0e4a532441cad9a7e4a2) showed added code complexity (like here) can be detrimental to performance, and if the `putChar` can be improved we might see benefits in more places.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2180308686
More information about the hotspot-compiler-dev
mailing list