RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores
Emanuel Peter
epeter at openjdk.org
Tue Jan 16 15:13:33 UTC 2024
On Wed, 25 Oct 2023 03:11:12 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> This is a feature requiested by @RogerRiggs and @cl4es .
>>
>> **Idea**
>>
>> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup.
>> Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get
>> speedups by using
>> Unsafe (e.g. `Unsafe.putLongUnaligned`), or
>> ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`).
>> They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code.
>>
>> This patch here supports a few simple use-cases, like these:
>>
>> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395
>>
>> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the
>> splitting (i.e. shifting and truncation), and directly store the variable:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456
>>
>> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian):
>>
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472
>>
>> **Details**
>>
>> This draft currently implements the optimization in an additional special IGVN phase:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485
>>
>> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`).
>> During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`.
>> We essentially try to establish a chain of mergable stores:
>>
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806
>>
>> Mergable stores must have the same Opcode (implies they have the same element type and hence size).
>> Further, mergable stores must have the same control (or be separated by only a RangeCheck).
>> Further,...
>
> I imagine it would be beneficial if we could merge stores to fields and stores from loads, which are common in object constructions.
>
> Thanks.
@merykitty do you have examples for both? Maybe stores to fields already works. Merging loads and stores may be out of scope. That sounds a little much like SLP. We can still try to do that in a future RFE. We could even try to use (masked) vector instructions.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1778600064
More information about the hotspot-compiler-dev
mailing list