RFR: 8341697: C2: Register allocation inefficiency in tight loop [v4]
Quan Anh Mai
qamai at openjdk.org
Sat Oct 12 10:55:10 UTC 2024
On Sat, 12 Oct 2024 10:30:51 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> Hi,
>>
>> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput.
>>
>> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies.
>>
>> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt.
>>
>> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive.
>> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly.
>> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that.
>>
>> Please take a look and leave your reviews, thanks a lot.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
>
> add LoopAwaredSpilling flag, refine implementation
New benchmark results:
Before After
Benchmark (prob) Mode Cnt Score Error Score Error Units
LoopCounterBench.field_ret N/A avgt 5 425.678 ± 5.086 419.819 ± 1.965 ns/op
LoopCounterBench.localVar_ret N/A avgt 5 1126.937 ± 1.078 325.651 ± 5.309 ns/op
LoopCounterBench.reloadAtEntry_ret N/A avgt 5 582.465 ± 2.649 491.421 ± 0.909 ns/op
LoopCounterBench.spillUncommon_ret 0.0 avgt 5 490.901 ± 5.505 490.981 ± 2.118 ns/op
LoopCounterBench.spillUncommon_ret 0.01 avgt 5 2491.557 ± 4.837 1912.170 ± 19.208 ns/op
LoopCounterBench.spillUncommon_ret 0.1 avgt 5 21316.571 ± 88.198 10518.618 ± 183.380 ns/op
LoopCounterBench.spillUncommon_ret 0.2 avgt 5 42095.064 ± 210.995 19908.240 ± 313.108 ns/op
LoopCounterBench.spillUncommon_ret 0.5 avgt 5 113825.492 ± 1637.428 48194.341 ± 719.049 ns/op
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2408520138
More information about the hotspot-compiler-dev
mailing list