RFR: 8283699: Improve the peephole mechanism of hotspot [v4]
Dean Long
dlong at openjdk.org
Thu Oct 6 08:45:12 UTC 2022
On Sat, 1 Oct 2022 10:22:42 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> Hi,
>>
>> The current peephole mechanism has several drawbacks:
>> - Can only match and remove adjacent instructions.
>> - Cannot match machine ideal nodes (e.g MachSpillCopyNode).
>> - Can only replace 1 instruction, the position of insertion is limited to the position at which the matched nodes reside.
>> - Is actually broken since the nodes are not connected properly and OptoScheduling requires true dependencies between nodes.
>>
>> The patch proposes to enhance the peephole mechanism by allowing a peep rule to call into a dedicated function, which takes the responsibility to perform all required transformations on the basic block. This allows the peephole mechanism to perform several transformations effectively in a more fine-grain manner.
>>
>> The patch uses the peephole optimisation to perform some classic peepholes, transforming on x86 the sequences:
>>
>> mov r1, r2 -> lea r1, [r2 + r3/i]
>> add r1, r3/i
>>
>> and
>>
>> mov r1, r2 -> lea r1, [r2 << i], with i = 1, 2, 3
>> shl r1, i
>>
>> On the added benchmarks, the transformations show positive results:
>>
>> Benchmark Mode Cnt Score Error Units
>> LeaPeephole.B_D_int avgt 5 1200.490 ± 104.662 ns/op
>> LeaPeephole.B_D_long avgt 5 1211.439 ± 30.196 ns/op
>> LeaPeephole.B_I_int avgt 5 1118.831 ± 7.995 ns/op
>> LeaPeephole.B_I_long avgt 5 1112.389 ± 15.838 ns/op
>> LeaPeephole.I_S_int avgt 5 1262.528 ± 7.293 ns/op
>> LeaPeephole.I_S_long avgt 5 1223.820 ± 17.777 ns/op
>>
>> Benchmark Mode Cnt Score Error Units
>> LeaPeephole.B_D_int avgt 5 860.889 ± 6.089 ns/op
>> LeaPeephole.B_D_long avgt 5 945.455 ± 21.603 ns/op
>> LeaPeephole.B_I_int avgt 5 849.109 ± 9.809 ns/op
>> LeaPeephole.B_I_long avgt 5 851.283 ± 16.921 ns/op
>> LeaPeephole.I_S_int avgt 5 976.594 ± 23.004 ns/op
>> LeaPeephole.I_S_long avgt 5 936.984 ± 9.601 ns/op
>>
>> A following patch would add IR tests for these transformations since the IR framework has not been able to parse the ideal scheduling yet although printing the scheduling itself has been made possible recently.
>>
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
>
> check index
src/hotspot/cpu/x86/peephole_x86.cpp line 28:
> 26: #ifdef COMPILER2
> 27:
> 28: #include "opto/peephole.hpp"
I don't see why opto/peephole.hpp is useful. Why not just include peephole_x86.hpp? Then the empty peephole_<cpu>.hpp for the other platforms are no longer needed.
src/hotspot/cpu/x86/peephole_x86.cpp line 50:
> 48: inst1 = inst0->in(1)->as_Mach();
> 49: src1 = in;
> 50: }
I don't understand why this optimization requires MachSpillCopy. Is that the only time we sould see mov+add or mov+shift?
src/hotspot/cpu/x86/peephole_x86.cpp line 132:
> 130: cfg_->map_node_to_block(proj, nullptr);
> 131: cfg_->map_node_to_block(root, block);
> 132:
A lot of this seems like boiler-plate that could be refactored to make writing new peephole helpers simpler and less error-prone.
-------------
PR: https://git.openjdk.org/jdk/pull/8025
More information about the hotspot-compiler-dev
mailing list