RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store

Emanuel Peter epeter at openjdk.org
Fri Jan 19 14:58:29 UTC 2024


On Fri, 19 Jan 2024 14:09:28 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> This is a feature requiested by @RogerRiggs and @cl4es .
>> 
>> **Idea**
>> 
>> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code.
>> 
>> This patch here supports a few simple use-cases, like these:
>> 
>> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395
>> 
>> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456
>> 
>> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian):
>> 
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472
>> 
>> **Details**
>> 
>> This draft currently implements the optimization in an additional special IGVN phase:
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485
>> 
>> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores:
>> 
>> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806
>> 
>> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot...
>
> Why not provide new internal API points and intrinsics? The benefits would be:
> - less complexity on the c2 side (and less bugs)
> - much easier for someone writing java code to check that the optimization triggers (check the PrintInlining output that the intrinsic shows up vs check the final assembly code)
> - clear contract between the java libraries and the VM as to what optimizes under what conditions
> 
> If I was the user for this I would be worried, that:
> - it's hard for me to check it's doing what I expect
> - even if it does initially, changes to the java code (maybe by other people less familiar with this transformation) could break the optimization. If there's a call to some specific API, at least people changing the code know special attention is necessary and that as long as the new API points are used, the optimization is guaranteed.

@rwestrel Well, that sounds like a good idea too. Especially for the better guarantees.
I guess the benefit of this optimization here is that it optimizes lots of existing code, which use adjacent array stores, for example with constants.

@cl4es @RogerRiggs what do you think, would you prefer a new internal API points and intrinsics?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1900569327


More information about the hotspot-compiler-dev mailing list