RFR: 8352316: More MergeStoreBench [v7]
Shaojin Wen
swen at openjdk.org
Thu Apr 3 12:19:52 UTC 2025
On Sat, 29 Mar 2025 07:43:32 GMT, Shaojin Wen <swen at openjdk.org> wrote:
>> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision:
>>
>> add StringBuilderUnsafePut
>
> I added a new scenario `StringBuilderUnsafePut`, using Unsafe to modify StringBuilder directly to implement append constants.
>
> The performance numbers below show that ArraySetConst/StringBuilderUnsafePut/UnsafePut have better performance.
>
> These numbers show that Stable Value's arraycopy has great performance optimization potential, which is worth more optimization for C2.
>
> # 1. Scipt
>
> git remote add wenshao git at github.com:wenshao/jdk.git
> git fetch wenshao
> git checkout cd1d8fb3b137a741446c894d1893e7180535ce8f
> make test TEST="micro:vm.compiler.MergeStoreBench.str"
>
>
> # 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)
>
> Benchmark Mode Cnt Score Error Units
> MergeStoreBench.str4ArraySetConst avgt 5 1338.414 ± 3.209 ns/op
> MergeStoreBench.str4Arraycopy avgt 5 7271.203 ± 19.400 ns/op
> MergeStoreBench.str4GetBytes avgt 5 6154.684 ± 9.910 ns/op
> MergeStoreBench.str4GetChars avgt 5 14078.790 ± 59.175 ns/op
> MergeStoreBench.str4StringBuilder avgt 5 15766.528 ± 4634.119 ns/op
> MergeStoreBench.str4StringBuilderAppendChar avgt 5 41388.364 ± 9871.409 ns/op
> MergeStoreBench.str4StringBuilderUnsafePut avgt 5 1575.792 ± 4.102 ns/op
> MergeStoreBench.str4UnsafePut avgt 5 1326.499 ± 2.400 ns/op
> MergeStoreBench.str4Utf16ArrayCopy avgt 5 13949.307 ± 1045.255 ns/op
> MergeStoreBench.str4Utf16ArraySetConst avgt 5 1511.967 ± 5.250 ns/op
> MergeStoreBench.str4Utf16StringBuilder avgt 5 18030.261 ± 1656.463 ns/op
> MergeStoreBench.str4Utf16StringBuilderAppendChar avgt 5 35047.855 ± 16674.635 ns/op
> MergeStoreBench.str4Utf16StringBuilderUnsafePut avgt 5 2785.792 ± 5.571 ns/op
> MergeStoreBench.str4Utf16UnsafePut avgt 5 1613.812 ± 1.249 ns/op
> MergeStoreBench.str5ArraySetConst avgt 5 2599.310 ± 8.667 ns/op
> MergeStoreBench.str5Arraycopy avgt 5 9487.926 ± 29.234 ns/op
> MergeStoreBench.str5GetBytes avgt 5 5972.453 ± 16.035 ns/op
> MergeStoreBench.str5GetChars avgt 5 13516.943 ± 10.978 ns/op
> MergeStoreBench.str5StringBuilder avgt 5 16539.070 ± 3097.339 ns/op
> MergeStoreBench.str5StringBuilderAppendChar avgt 5 50506.770 ± 11536.41...
> @wenshao @iwanowww I have a few concerns about this PR.
>
> Your current PR description says this:
>
> > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works
>
> First: a benchmark is not the best way `to verify whether MergeStore works`. An IR test would be more helpful, as it could check reliably what IR is generated, and hence if MergeStores actually optimized anything.
>
> Second: A JMH benchmark could also be helpful, but only if you run it with and without MergeStores enabled. Otherwise how would you know if it was MergeStores or another optimization that is relevant here?
>
> Third: `getBytes` / `arraycopy` is **NOT** a MergeStores pattern. These are **COPY** patterns. So they probably should go to a separate benchmark file. I don't want the MergeStores benchmark polluted with unrelated cases. I could be wrong here, and just not see how these cases are MergeStore cases, but you need to show the details here.
>
> I put some time in understanding your PR and asking you a list of questions. You did not really respond to them, and that is frustrating to me and makes me feel like my time is not valued: [#24108 (comment)](https://github.com/openjdk/jdk/pull/24108#issuecomment-2762946069)
>
> You say this:
>
> > By default, in OpenJDK, COMPACT_STRINGS = true, and the String coder without UTF16 characters is LATIN1, which is implemented using System.arraycopy. However, since String is immutable and System.arraycopy is directly performed on byte[], C2 should have more opportunities for optimization.
>
> Maybe the `System.arraycopy` can be optimized. But I don't think it is the MergeStores optimization that would do that. This is really a **Copy** pattern and not a `MergeStores` pattern. Please read the PRs on MergeStores to see what patterns are covered.
>
> And like I asked in previously:
>
> > Can you investigate what code it generates, and what kinds of optimizations are missing to make it close in performance to the Unsafe benchmark?
> > I don't have time to do all the deep investigations myself. But feel free to ask me if you have more questions.
>
> To me, benchmarks are only helpful and worth integrating if there is some clear and documented purpose. It would be really nice if you could invest some time into that :)
The C2 MergeStore you made is very good. I think you did a great job, so I submitted this PR, hoping that C2 can do more. But I am a Java programmer, not good at C++ and assembly. I don’t know how to investigate the details. Can you give me some suggestions?
I don’t know the details of the optimizer yet, and I can’t provide IR tests. This benchmark and the performance numbers of the results prove that there is a lot of room for performance improvement in the copy of constant String and byte[].
As you said, this does not look like MergeStore, but should be a constant copy optimization. I can separate this into a separate Benchmark. Can you give me some suggestions on the name of the Benchmark?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2775585230
More information about the hotspot-compiler-dev
mailing list