RFR: 8319451: PhaseIdealLoop::conditional_move is too conservative
Quan Anh Mai
qamai at openjdk.org
Thu Dec 7 16:58:30 UTC 2023
On Mon, 13 Nov 2023 19:53:30 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> Hi,
>>
>> When transforming a Phi into a CMove, the threshold is set to be approximately BlockLayoutMinDiamondPercentage, the reason is given:
>>
>> // BlockLayoutByFrequency optimization moves infrequent branch
>> // from hot path. No point in CMOV'ing in such case
>>
>> This sets the default value of the threshold to be around 18%, which is too conservative. The reason also does not make a lot of sense since the important property which makes jumping expensive is not code layout. We should remove this.
>>
>> Please kindly review, thank you very much.
>
> Looks fine to me.
>
> Looking on history of this code and I added it to address [JDK-7097546](https://bugs.openjdk.org/browse/JDK-7097546).
>
> But later it was found not correct for some case and I even had similar fix prototype: [JDK-8034833](https://bugs.openjdk.org/browse/JDK-8034833). There was additional changes proposed there: in `block.hpp` and `.ad` file.
>
> Please, look on attached in that report test and additional code changes there. May be be we can improve more `cmove`. It could be done separately from this your fix if you want to spend more time on it.
@vnkozlov I have investigated a little bit.
For these kinds of loops
public static int test(int result, int limit, int mask) { // mask = 15
for (int i = 0; i < limit; i++) {
if ((i&mask) == 0) result++; // Non frequent
}
return result;
}
Since this loop is perfectly predictable, no threshold of `CMove` transformation may offer performance advantages. I don't think this predictable branch is common, though.
Regarding the register pressure relating to a `CMove`, the main issue is that our local code motion does not do much (it does some heuristics around calls; the other element is block-wise latency, which is kind of useless in LCM context), I have tried some heuristics but it is easy to find a case where it is insufficient. I think it is probably a good idea to reimplement LCM using a more optimal algorithm.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16524#issuecomment-1845703278
More information about the hotspot-compiler-dev
mailing list