RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store
Roland Westrelin
roland at openjdk.org
Mon Jan 29 14:41:29 UTC 2024
On Fri, 19 Jan 2024 14:09:28 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> This is a feature requiested by @RogerRiggs and @cl4es .
>>
>> **Idea**
>>
>> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code.
>>
>> This patch here supports a few simple use-cases, like these:
>>
>> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395
>>
>> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456
>>
>> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian):
>>
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472
>>
>> **Details**
>>
>> This draft currently implements the optimization in an additional special IGVN phase:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485
>>
>> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores:
>>
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806
>>
>> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot...
>
> Why not provide new internal API points and intrinsics? The benefits would be:
> - less complexity on the c2 side (and less bugs)
> - much easier for someone writing java code to check that the optimization triggers (check the PrintInlining output that the intrinsic shows up vs check the final assembly code)
> - clear contract between the java libraries and the VM as to what optimizes under what conditions
>
> If I was the user for this I would be worried, that:
> - it's hard for me to check it's doing what I expect
> - even if it does initially, changes to the java code (maybe by other people less familiar with this transformation) could break the optimization. If there's a call to some specific API, at least people changing the code know special attention is necessary and that as long as the new API points are used, the optimization is guaranteed.
> @rwestrel
>
> > Why not simply enqueue stores for post loop opts processing by igvn?
>
> That does not work. I need to do the processing post-post-loop-opts. And not during post-loop-opts. Because during post-loop-opts, some CastII nodes fold away, and that simplifies the pointers, and makes it easier to find adjacent memops.
Range check castIIs? With JDK-8324517, I will possibly propose they stay in until the end of compilation (it used to cause performance to regress so I have to take a close look).
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1914827679
More information about the hotspot-compiler-dev
mailing list