RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v6]
Emanuel Peter
epeter at openjdk.org
Tue Jan 30 16:17:44 UTC 2024
On Mon, 29 Jan 2024 14:21:01 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>>
>> made trace flag develop
>
> I only skimmed through the code, so maybe it is already handled.
>
> Let me ask for clarity anyway: does it combine the series of naturally aligned stores into the bulk -- possibly misaligned! -- store? Because it would have performance implications, and depending on how hardware treats the misaligned stores, the correctness problem too. E.g., if we allow transforming `storeB(&(A+1), V1); storeB(&(A+2), V2);` -> `storeC(&(A+1), combine(V1, V2)`, then `storeC` might not be aligned. The platforms are allowed to throw the VM under `SIGBUS` when that store is executed. `Unsafe.putXUnaligned` was done to avoid this trouble, which only intrinsifies when `UseUnalignedAccesses` is `true`, and maybe have more safeguards that I don't remember off-hand.
>
> Note that C2 already does some of the similar store tiling in `InitializeNode::coalesce_subword_stores` for initializing stores -- should these two coalescing phases share some code?
@shipilev
You are right, I need to guard the optimization with `UseUnalignedAccesses`. Just added it. Thanks you 😊
Probably my tests would have run into the `SIGBUS` you mentioned.
About `InitializeNode::coalesce_subword_stores`:
It only works on raw-stores, which write fields before the initialization of an object. It only works with constants.
Hence, the pattern is quite different.
Merging the two would be a lot of work. Too much for me for now.
But maybe one day we can cover all these cases in a single optimization, that merges/coalesces all sorts of loads and stores, and essencially vectorizes any straingt-line code, at least for loads and stores.
For now, I just wanted to add the feature that @cl4es and @RogerRiggs were specifically asking for, which is merging array stores for constants and variables (using shift to split).
@rwestrel
Ok. Well in that case I might have to make a more intelligent pointer-analysis, and parse past `ConvI2L` and `CastII` nodes.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1917040449
More information about the hotspot-compiler-dev
mailing list