RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store

Mon Apr 15 17:09:04 UTC 2024

On Mon, 15 Apr 2024 06:49:43 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> > ```
>>> > * No RangeCheck smearing, or other CFG between the stores: `RC[0], store[0], RC[1], store[1], RC[2], store[2], RC[3], store[3]`. Not so simple. We can merge the 4 stores on the normal path, where all RC's pass. But we have to remove all old stores from that path. But the `RC[1], RC[2], RC[3]` false paths need some of those stores. So the only way I see is to duplicate all stores for the branches, so that we are sure that they sink out into the trap-paths.
>>> > ```
>>> 
>>> I also think you need to duplicate stores. My opinion is that we want to stick with the simpler cases (your first and second bullets) unless it's obvious it doesn't cover all use cases. It's always possible to revisit the optimization down the road if it's observed that there are cases that are not covered.
>> 
>> I completely agree with Roland.
>
> @vnkozlov 
>> Can we detect presence of RangeCheck which may cause us to move some stores on fail path and bailout the optimization. I don't think it is frequent case. I assume you will get RC on each store or not at all ("main" part of counted loop). Am I wrong here? I don't remember, does C2 optimize RangeCheck nodes in linear code (it does in loops)?
> 
> I know about 2 relevant optimizations that remove / move RangeChecks:
> - RCE (RangeCheck Elimination from loops): hoist all RangeCheck before the loop. That way, there are no RangeChecks left in the loop, and there would be no RangeChecks between the stores we are merging.
> - RangeCheck Smearing: this also applies in straight-line code, outside of loops. See `RangeCheckNode::Ideal`. Example:
> 
> RangeCheck[i+0]
> Store[i+0]
> RangeCheck[i+1]  <--- replaced with i+3 ("smearing" to cover all RC below)
> Store[i+1]
> RangeCheck[i+2]  <--- removed
> Store[i+2]
> RangeCheck[i+3]  <--- removed
> Store[i+3]
> 
> becomes:
> 
> RangeCheck[i+0]
> Store[i+0]
> RangeCheck[i+3]  <--- the RangeCheck that remains between the first and the rest of the consecutive (and adjacent) stores.
> Store[i+1]
> Store[i+2]
> Store[i+3]
> 
> I think the use-cases from @cl4es are often in straight-line code. Therefore we should cover the "smearing" case where exactly 1 RC remains in the sequence.
> 
> What you can also see in `RangeCheckNode::Ideal`: if we ever trap (or often enough, I don't remember) in one of the RangeChecks, then we disable `phase->C->allow_range_check_smearing()`. Then we don't do the smearing, and all the RC remain in the sequence. At that point, my optimization would fail since it sees more than 1 RC in the sequence.
> 
> Does that make sense? I should probably add this information in the comments, so  that it is clear why we worry about a single RC at all. People are probably going to wonder like you: "I assume you will get RC on each store or not at all".

@eme64 thank you for looking on C2 RC optimizations. Now it is clear why you need to check for RC.
I would only suggest to adjust your new comment  about TC optimization to avoid confusion.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-2057411419