RFR: 8341697: C2: Register allocation inefficiency in tight loop [v4]

Quan Anh Mai qamai at openjdk.org
Sat Oct 12 10:55:10 UTC 2024


On Sat, 12 Oct 2024 10:30:51 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Hi,
>> 
>> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput.
>> 
>> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies.
>> 
>> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt.
>> 
>> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive.
>> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly.
>> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that.
>> 
>> Please take a look and leave your reviews, thanks a lot.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add LoopAwaredSpilling flag, refine implementation

New benchmark results:

                                                                   Before                 After
    Benchmark                           (prob)  Mode  Cnt       Score      Error      Score     Error  Units
    LoopCounterBench.field_ret             N/A  avgt    5     425.678 ±    5.086    419.819 ±   1.965  ns/op
    LoopCounterBench.localVar_ret          N/A  avgt    5    1126.937 ±    1.078    325.651 ±   5.309  ns/op
    LoopCounterBench.reloadAtEntry_ret     N/A  avgt    5     582.465 ±    2.649    491.421 ±   0.909  ns/op
    LoopCounterBench.spillUncommon_ret     0.0  avgt    5     490.901 ±    5.505    490.981 ±   2.118  ns/op
    LoopCounterBench.spillUncommon_ret    0.01  avgt    5    2491.557 ±    4.837   1912.170 ±  19.208  ns/op
    LoopCounterBench.spillUncommon_ret     0.1  avgt    5   21316.571 ±   88.198  10518.618 ± 183.380  ns/op
    LoopCounterBench.spillUncommon_ret     0.2  avgt    5   42095.064 ±  210.995  19908.240 ± 313.108  ns/op
    LoopCounterBench.spillUncommon_ret     0.5  avgt    5  113825.492 ± 1637.428  48194.341 ± 719.049  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2408520138


More information about the hotspot-compiler-dev mailing list