RFR: 8352316: More MergeStoreBench [v7]

Thu Apr 3 12:19:52 UTC 2025

On Sat, 29 Mar 2025 07:43:32 GMT, Shaojin Wen <swen at openjdk.org> wrote:

>> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   add StringBuilderUnsafePut
>
> I added a new scenario `StringBuilderUnsafePut`, using Unsafe to modify StringBuilder directly to implement append constants.
> 
> The performance numbers below show that ArraySetConst/StringBuilderUnsafePut/UnsafePut have better performance.
> 
> These numbers show that Stable Value's arraycopy has great performance optimization potential, which is worth more optimization for C2.
> 
> # 1. Scipt
> 
> git remote add wenshao git at github.com:wenshao/jdk.git
> git fetch wenshao
> git checkout cd1d8fb3b137a741446c894d1893e7180535ce8f
> make test TEST="micro:vm.compiler.MergeStoreBench.str"
> 
> 
> # 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)
> 
> Benchmark                                         Mode  Cnt      Score       Error  Units
> MergeStoreBench.str4ArraySetConst                 avgt    5   1338.414 ±     3.209  ns/op
> MergeStoreBench.str4Arraycopy                     avgt    5   7271.203 ±    19.400  ns/op
> MergeStoreBench.str4GetBytes                      avgt    5   6154.684 ±     9.910  ns/op
> MergeStoreBench.str4GetChars                      avgt    5  14078.790 ±    59.175  ns/op
> MergeStoreBench.str4StringBuilder                 avgt    5  15766.528 ±  4634.119  ns/op
> MergeStoreBench.str4StringBuilderAppendChar       avgt    5  41388.364 ±  9871.409  ns/op
> MergeStoreBench.str4StringBuilderUnsafePut        avgt    5   1575.792 ±     4.102  ns/op
> MergeStoreBench.str4UnsafePut                     avgt    5   1326.499 ±     2.400  ns/op
> MergeStoreBench.str4Utf16ArrayCopy                avgt    5  13949.307 ±  1045.255  ns/op
> MergeStoreBench.str4Utf16ArraySetConst            avgt    5   1511.967 ±     5.250  ns/op
> MergeStoreBench.str4Utf16StringBuilder            avgt    5  18030.261 ±  1656.463  ns/op
> MergeStoreBench.str4Utf16StringBuilderAppendChar  avgt    5  35047.855 ± 16674.635  ns/op
> MergeStoreBench.str4Utf16StringBuilderUnsafePut   avgt    5   2785.792 ±     5.571  ns/op
> MergeStoreBench.str4Utf16UnsafePut                avgt    5   1613.812 ±     1.249  ns/op
> MergeStoreBench.str5ArraySetConst                 avgt    5   2599.310 ±     8.667  ns/op
> MergeStoreBench.str5Arraycopy                     avgt    5   9487.926 ±    29.234  ns/op
> MergeStoreBench.str5GetBytes                      avgt    5   5972.453 ±    16.035  ns/op
> MergeStoreBench.str5GetChars                      avgt    5  13516.943 ±    10.978  ns/op
> MergeStoreBench.str5StringBuilder                 avgt    5  16539.070 ±  3097.339  ns/op
> MergeStoreBench.str5StringBuilderAppendChar       avgt    5  50506.770 ± 11536.41...

> @wenshao @iwanowww I have a few concerns about this PR.
> 
> Your current PR description says this:
> 
> > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works
> 
> First: a benchmark is not the best way `to verify whether MergeStore works`. An IR test would be more helpful, as it could check reliably what IR is generated, and hence if MergeStores actually optimized anything.
> 
> Second: A JMH benchmark could also be helpful, but only if you run it with and without MergeStores enabled. Otherwise how would you know if it was MergeStores or another optimization that is relevant here?
> 
> Third: `getBytes` / `arraycopy` is **NOT** a MergeStores pattern. These are **COPY** patterns. So they probably should go to a separate benchmark file. I don't want the MergeStores benchmark polluted with unrelated cases. I could be wrong here, and just not see how these cases are MergeStore cases, but you need to show the details here.
> 
> I put some time in understanding your PR and asking you a list of questions. You did not really respond to them, and that is frustrating to me and makes me feel like my time is not valued: [#24108 (comment)](https://github.com/openjdk/jdk/pull/24108#issuecomment-2762946069)
> 
> You say this:
> 
> > By default, in OpenJDK, COMPACT_STRINGS = true, and the String coder without UTF16 characters is LATIN1, which is implemented using System.arraycopy. However, since String is immutable and System.arraycopy is directly performed on byte[], C2 should have more opportunities for optimization.
> 
> Maybe the `System.arraycopy` can be optimized. But I don't think it is the MergeStores optimization that would do that. This is really a **Copy** pattern and not a `MergeStores` pattern. Please read the PRs on MergeStores to see what patterns are covered.
> 
> And like I asked in previously:
> 
> > Can you investigate what code it generates, and what kinds of optimizations are missing to make it close in performance to the Unsafe benchmark?
> > I don't have time to do all the deep investigations myself. But feel free to ask me if you have more questions.
> 
> To me, benchmarks are only helpful and worth integrating if there is some clear and documented purpose. It would be really nice if you could invest some time into that :)

The C2 MergeStore you made is very good. I think you did a great job, so I submitted this PR, hoping that C2 can do more. But I am a Java programmer, not good at C++ and assembly. I don’t know how to investigate the details. Can you give me some suggestions?

I don’t know the details of the optimizer yet, and I can’t provide IR tests. This benchmark and the performance numbers of the results prove that there is a lot of room for performance improvement in the copy of constant String and byte[].

As you said, this does not look like MergeStore, but should be a constant copy optimization. I can separate this into a separate Benchmark. Can you give me some suggestions on the name of the Benchmark?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2775585230