RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store
Emanuel Peter
epeter at openjdk.org
Thu Mar 21 14:04:27 UTC 2024
On Wed, 7 Feb 2024 18:11:52 GMT, Claes Redestad <redestad at openjdk.org> wrote:
>> There already are internal APIs and `VarHandles` to enable similar optimizations, see e.g. `jdk.internal.util.ByteArray/-LittleEndian::putInt/-Long`.
>>
>> The very point of this RFE was to opportunistically enable similar optimizations more automatically in idiomatic java code without the need to bring out the big guns. Of course such automatic transformations will have some level of fragility and you might accidentally disable the optimization in a variety of ways (since C2 needs to draw the line somewhere) - but that's the case for many other heuristic and opportunistic optimizations. Should we not optimize counted loops or do loop unrolling because it's easy to add something that makes C2 bail out on you?
>>
>> Having this optimization in C2 also allows us to avoid dependencies on `VarHandles` in bootstrap sensitive code and still enable the optimization. It might also have a benefit on startup/warmup characteristics.
>
>> @cl4es Ok, I guess there is good motivation to keep working on this, looks like this patch here even outperforms #15990
>
> Yes! A bit surprising but a great proof of concept for this optimization. I think it might be useful to analyze what the JIT is doing w.r.t inlining etc in the three variants.
@cl4es @rwestrel
The question is what we should now go for, I did a little thinking and here are my thoughts:
We have to avoid duplication of stores. There are a few scenarios:
- No control flow between the stores: simple, we just replace the old stores with the new merged one.
- Smeared RangeChecks, of the form `RC[0], store[0], RC[3], store[1], store[2], store[3]`: also simple, we can replace the last store with the merged one, and let IGVN remove `store[1], store[2]`, and `store[0]` will sink into the false-path of `RC[3]`.
- No RangeCheck smearing, or other CFG between the stores: `RC[0], store[0], RC[1], store[1], RC[2], store[2], RC[3], store[3]`. Not so simple. We can merge the 4 stores on the normal path, where all RC's pass. But we have to remove all old stores from that path. But the `RC[1], RC[2], RC[3]` false paths need some of those stores. So the only way I see is to duplicate all stores for the branches, so that we are sure that they sink out into the trap-paths.
So: do we care about the non-RC smearing case? More precisely: do we expect that there will ever be out-of-bounds exceptions in the relevant code-patterns for which we are trying to merge the stores? Because if we hit out-of-bounds, then we trap, disable RC smearing, and then it gets complicated. But still doable, I think.
What are the opinions on this? Any other suggestions?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-2012383079
More information about the hotspot-compiler-dev
mailing list