RFR: 8283699: Improve the peephole mechanism of hotspot [v5]

Vladimir Ivanov vlivanov at openjdk.org
Mon Oct 10 23:10:11 UTC 2022


On Sat, 8 Oct 2022 15:42:31 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Hi,
>> 
>> The current peephole mechanism has several drawbacks:
>> - Can only match and remove adjacent instructions.
>> - Cannot match machine ideal nodes (e.g MachSpillCopyNode).
>> - Can only replace 1 instruction, the position of insertion is limited to the position at which the matched nodes reside.
>> - Is actually broken since the nodes are not connected properly and OptoScheduling requires true dependencies between nodes.
>> 
>> The patch proposes to enhance the peephole mechanism by allowing a peep rule to call into a dedicated function, which takes the responsibility to perform all required transformations on the basic block. This allows the peephole mechanism to perform several transformations effectively in a more fine-grain manner.
>> 
>> The patch uses the peephole optimisation to perform some classic peepholes, transforming on x86 the sequences:
>> 
>>     mov r1, r2    ->    lea r1, [r2 + r3/i]
>>     add r1, r3/i
>> 
>> and
>> 
>>     mov r1, r2    ->    lea r1, [r2 << i], with i = 1, 2, 3
>>     shl r1, i
>> 
>> On the added benchmarks, the transformations show positive results:
>> 
>>     Benchmark             Mode  Cnt     Score     Error  Units
>>     LeaPeephole.B_D_int   avgt    5  1200.490 ± 104.662  ns/op
>>     LeaPeephole.B_D_long  avgt    5  1211.439 ±  30.196  ns/op
>>     LeaPeephole.B_I_int   avgt    5  1118.831 ±   7.995  ns/op
>>     LeaPeephole.B_I_long  avgt    5  1112.389 ±  15.838  ns/op
>>     LeaPeephole.I_S_int   avgt    5  1262.528 ±   7.293  ns/op
>>     LeaPeephole.I_S_long  avgt    5  1223.820 ±  17.777  ns/op
>> 
>>     Benchmark             Mode  Cnt    Score    Error  Units
>>     LeaPeephole.B_D_int   avgt    5  860.889 ±  6.089  ns/op
>>     LeaPeephole.B_D_long  avgt    5  945.455 ± 21.603  ns/op
>>     LeaPeephole.B_I_int   avgt    5  849.109 ±  9.809  ns/op
>>     LeaPeephole.B_I_long  avgt    5  851.283 ± 16.921  ns/op
>>     LeaPeephole.I_S_int   avgt    5  976.594 ± 23.004  ns/op
>>     LeaPeephole.I_S_long  avgt    5  936.984 ±  9.601  ns/op
>> 
>> A following patch would add IR tests for these transformations since the IR framework has not been able to parse the ideal scheduling yet although printing the scheduling itself has been made possible recently.
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   refactor includes

src/hotspot/cpu/x86/x86_64.ad line 324:

> 322: 
> 323: source_hpp %{
> 324:   #include CPU_HEADER(peephole)

Why don't you simply include `peephole_x86_64.hpp` here?

-------------

PR: https://git.openjdk.org/jdk/pull/8025


More information about the hotspot-compiler-dev mailing list