RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v17]

Roland Westrelin roland at openjdk.org
Wed Mar 6 09:00:57 UTC 2024


On Tue, 5 Mar 2024 15:55:12 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> This is a feature requiested by @RogerRiggs and @cl4es .
>> 
>> **Idea**
>> 
>> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code.
>> 
>> This patch here supports a few simple use-cases, like these:
>> 
>> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395
>> 
>> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456
>> 
>> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian):
>> 
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472
>> 
>> **Details**
>> 
>> This draft currently implements the optimization in an additional special IGVN phase:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485
>> 
>> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores:
>> 
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806
>> 
>> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   a little bit of casting for debug printing code

Do you intend to add an IR test case?

src/hotspot/share/opto/compile.cpp line 2927:

> 2925: }
> 2926: 
> 2927: void Compile::gather_nodes_for_merge_stores(PhaseIterGVN &igvn) {

This is going away, right?

src/hotspot/share/opto/memnode.cpp line 2802:

> 2800:   StoreNode* use = can_merge_primitive_array_store_with_use(phase, true);
> 2801:   if (use != nullptr) {
> 2802:     return nullptr;

Do you want to assert that the use is in the igvn worklist?

src/hotspot/share/opto/memnode.cpp line 2971:

> 2969: // The goal is to check if two such ArrayPointers are adjacent for a load or store.
> 2970: //
> 2971: // Note: we accumulate all constant offsets into constant_offset, even the int constant behind

Is this really needed? For the patterns of interest, aren't the constant pushed down the chain of `AddP` nodes so the address is `(AddP base (AddP ...) constant)`?

src/hotspot/share/opto/memnode.cpp line 3146:

> 3144:   Node* ctrl_s1 = s1->in(MemNode::Control);
> 3145:   Node* ctrl_s2 = s2->in(MemNode::Control);
> 3146:   if (ctrl_s1 != ctrl_s2) {

Do you need to check that `ctrl_s1` and `ctrl_s2` are not null? I suppose this could be called on a dying part of the graph during igvn.

src/hotspot/share/opto/memnode.cpp line 3154:

> 3152:     }
> 3153:     ProjNode* other_proj = ctrl_s1->as_IfProj()->other_if_proj();
> 3154:     if (other_proj->is_uncommon_trap_proj(Deoptimization::Reason_range_check) == nullptr ||

This could be a range check for an unrelated array I suppose. Does it matter?

src/hotspot/share/opto/memnode.hpp line 578:

> 576: 
> 577:   Node* Ideal_merge_primitive_array_stores(PhaseGVN* phase);
> 578:   StoreNode* can_merge_primitive_array_store_with_use(PhaseGVN* phase, bool check_def);

If I understand correctly you need the `check_def ` parameter to avoid having `can_merge_primitive_array_store_with_use` and `can_merge_primitive_array_store_with_def` call each other indefinitely. But if I was to write new code that takes advantage of one of the two methods, I think I would be puzzled that there's a `check_def` parameter. Passing `false` would be wrong then but maybe not immediately obvious. Maybe it would be better to have `can_merge_primitive_array_store_with_def` with no `check_def` parameter and have all the work done in a utility method that takes a `check_def` parameter (always `true` when called from `can_merge_primitive_array_store_with_def`)

-------------

PR Review: https://git.openjdk.org/jdk/pull/16245#pullrequestreview-1919069208
PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1514032470
PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1514033069
PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1514071393
PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1514057889
PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1514062586
PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1514051769


More information about the hotspot-compiler-dev mailing list