RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]

Emanuel Peter epeter at openjdk.org
Thu Jun 13 07:39:13 UTC 2024


On Wed, 12 Jun 2024 15:58:59 GMT, Shaojin Wen <duke at openjdk.org> wrote:

>>> @eme64 It seems like MergeStore didn't happen, is there something I did wrong?
>> 
>> Yes ;)
>> 
>> @wenshao The issue is that the pattern matching is quite **limited**.
>> `putCharsAt(byte[] value, int i, char c1, char c2, char c3, char c4)`
>> This has 4 variables. This is not allowed for the optimization. You would have to pass in a `long` and split it via shifts.
>> A possible exception: if `putCharsAt` is inlined and outside it is a splitting of a `long` value with shift, then ... maybe ... this would be optimized with `MergeStores`.
>> 
>> There are only 2 **accepted** patterns:
>> - Multiple stores of constants -> constant is merged, which allows stores to be merged.
>> - Multiple stores which all store a "section" of a larger value:
>> 
>> You can see **examples** I listed at the beginning of the PR description in https://github.com/openjdk/jdk/pull/16245.
>
> Thanks to @eme64 's patience and help, I found a way to use MergeStore without doing boundary checking.
> 
> It would be even better if Unsafe.putChar could be used for MergeStore in the future.
> 
> ## 1. JavaCode
> 
> class StringUTF16 {
>     public static void putCharsAt(byte[] value, int i, char c1, char c2, char c3, char c4) {
>         // Don't use the putChar method, Its instrinsic will cause C2 unable to combining values into larger stores.
>         putChar1(value, i    , c1);
>         putChar1(value, i + 1, c2);
>         putChar1(value, i + 2, c3);
>         putChar1(value, i + 3, c4);
>     }
> 
>     public static void putCharsAt(byte[] value, int i, char c1, char c2, char c3, char c4, char c5) {
>         // Don't use the putChar method, Its instrinsic will cause C2 unable to combining values into larger stores.
>         putChar1(value, i    , c1);
>         putChar1(value, i + 1, c2);
>         putChar1(value, i + 2, c3);
>         putChar1(value, i + 3, c4);
>         putChar1(value, i + 4, c5);
>     }
> 
>     static void putChar1(byte[] value, int i, char c) {
>         int address = Unsafe.ARRAY_BYTE_BASE_OFFSET + (i << 1);
>         Unsafe.getUnsafe().putByte(value, address    , (byte)(c >> HI_BYTE_SHIFT));
>         Unsafe.getUnsafe().putByte(value, address + 1, (byte)(c >> LO_BYTE_SHIFT));
>     }
> }
> 
> 
> ## 2. Benchmark Numbers
> The performance numbers under MacBookPro M1 Max are as follows:
> 
> -Benchmark                                   Mode  Cnt  Score   Error  Units
> -StringBuilders.toStringCharWithNull8Latin1  avgt   15  7.073 ? 0.010  ns/op
> -StringBuilders.toStringCharWithNull8Utf16   avgt   15  9.298 ? 0.015  ns/op
> -StringBuilders.toStringCharWithBool8Latin1  avgt   15  7.387 ? 0.021  ns/op
> -StringBuilders.toStringCharWithBool8Utf16   avgt   15  9.615 ? 0.011  ns/op
> 
> +Benchmark                                   Mode  Cnt  Score   Error  Units
> +StringBuilders.toStringCharWithNull8Latin1  avgt   15  5.628 ? 0.017  ns/op +25.67%
> +StringBuilders.toStringCharWithNull8Utf16   avgt   15  6.873 ? 0.026  ns/op +35.28%
> +StringBuilders.toStringCharWithBool8Latin1  avgt   15  6.480 ? 0.077  ns/op +13.99%
> +StringBuilders.toStringCharWithBool8Utf16   avgt   15  7.456 ? 0.059  ns/op +28.95%
> 
> 
> ## 3.  TraceMergeStores
> 
> [TraceMergeStores]: Replace
>  65  StoreB  === 54 7 64 12  [[ 87 ]]  @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=4; unsafe  Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=4; !jvms: StringUTF16::putChar1 @ bci:20 (li...

@wenshao I'm glad we figured it out a bit, and I hope you learned a thing or two :)

> It would be even better if Unsafe.putChar could be used for MergeStore in the future.

The `_putCharStringU` intrinsic uses `LibraryCallKit::inline_string_char_access`. It seems to generate IR nodes, I speculate it could be `StoreC`. We'd have to do more digging to see why that pattern is not accepted by `MergeStore`.

@wenshao I'm not sure if everything is right with the bounds checking, but I leave this to the library folks to review. What you will definately need is benchmarking not just on `M1`, but also `x86-64` and other `aarch64` architectures.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2164851377


More information about the core-libs-dev mailing list