RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v17]
Emanuel Peter
epeter at openjdk.org
Thu Mar 7 15:56:04 UTC 2024
On Tue, 5 Mar 2024 15:55:12 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> This is a feature requiested by @RogerRiggs and @cl4es .
>>
>> **Idea**
>>
>> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code.
>>
>> This patch here supports a few simple use-cases, like these:
>>
>> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395
>>
>> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456
>>
>> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian):
>>
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472
>>
>> **Details**
>>
>> This draft currently implements the optimization in an additional special IGVN phase:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485
>>
>> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores:
>>
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806
>>
>> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> a little bit of casting for debug printing code
> > Is there a chance then that we store to the same element twice ... Would it be legal wrt the java specs?
>
> AFAIU, introducing writes that do not exist in original program is an easy way to break JMM conformance. If we merge the writes, we have to make sure the old writes are not done. You _need_ to run jcstress on this change, at very least.
Ok. I have never heard of jcstress. But will look into it. Maybe I need to do some more careful checks to ensure that on the merged path there is only the merged store, and the other stores sink into the other paths. More complicated than I thought... 🙈
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1983815904
More information about the hotspot-compiler-dev
mailing list