RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v4]

Tobias Hartmann thartmann at openjdk.org
Mon Jan 29 12:03:40 UTC 2024


On Fri, 26 Jan 2024 09:35:50 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> This is a feature requiested by @RogerRiggs and @cl4es .
>> 
>> **Idea**
>> 
>> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code.
>> 
>> This patch here supports a few simple use-cases, like these:
>> 
>> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395
>> 
>> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456
>> 
>> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian):
>> 
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472
>> 
>> **Details**
>> 
>> This draft currently implements the optimization in an additional special IGVN phase:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485
>> 
>> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores:
>> 
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806
>> 
>> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add diagnostic flag MergeStores

Great work, Emanuel.

I think this is a well encapsulated optimization for a supposedly common code pattern requested by core libraries folks. I agree with Vladimir, that it would be nice to support this as part of the autovectorizer but that is probably not going to happen anytime soon. Until then, going with this separate phase would allow us to add support (and tests) for additional code patterns if requests come in and potentially move this to the autovectorizer later.

src/hotspot/share/opto/memnode.cpp line 2766:

> 2764:   if (phase->C->merge_stores_phase()) {
> 2765:     Node* progress = Ideal_merge_stores(phase);
> 2766:     if(progress != nullptr) { return progress; }

Suggestion:

    if (progress != nullptr) { return progress; }

src/hotspot/share/opto/memnode.cpp line 2772:

> 2770: }
> 2771: 
> 2772: // Link together multiple stores (B/S/C/I) into alonger one.

Suggestion:

// Link together multiple stores (B/S/C/I) into a longer one.

src/hotspot/share/opto/memnode.cpp line 3014:

> 3012:   //   AddP(AddP(AddP(AddP(base, o2), o2), o1), con)
> 3013:   //
> 3014:   // Two adresses are adjacent, if they share a base and all offset (o1, o2, ...)

Suggestion:

  // Two addresses are adjacent, if they share a base and all offsets (o1, o2, ...)

src/hotspot/share/opto/phaseX.cpp line 2282:

> 2280:     tty->print_cr("Set at i = %d", i);
> 2281:     n->dump();
> 2282:     assert( igvn->hash_find(this) != this, "Need to remove from hash before changing edges" );

Suggestion:

    assert(false, "Need to remove from hash before changing edges");

-------------

Marked as reviewed by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16245#pullrequestreview-1848441479
PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1469450595
PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1469466833
PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1469472461
PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1469458332


More information about the hotspot-compiler-dev mailing list