RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set
Jorn Vernee
jvernee at openjdk.org
Mon Jul 24 13:42:00 UTC 2023
On Thu, 13 Jul 2023 15:17:25 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:
>> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction.
>> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value.
>> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe.
>> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed:
>> Results on Intel Core i5-8250U CPU
>> Before this patch:
>>
>> Benchmark Mode Cnt Score Error Units
>> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ± 1.751 ns/op
>> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ± 0.002 ns/op
>> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ± 0.310 ns/op
>> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ± 0.002 ns/op
>> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ± 0.215 ns/op
>> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ± 0.002 ns/op
>> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ± 0.666 ns/op
>> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ± 0.001 ns/op
>> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ± 2.034 ns/op
>> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ± 0.002 ns/op
>> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ± 0.602 ns/op
>> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183 ± 0.001 ns/op
>>
>> After this patch:
>>
>> Benchmark Mode Cnt Score Error Units Change
>> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ± 4.747 ns/op ~29% faster
>> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ± 0.002 ns/op (unchanged)
>> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ± 1.094 ns/op (unchanged)
>> TestRemovalPeephole.bench...
>
> src/hotspot/cpu/x86/x86_64.ad line 13796:
>
>> 13794: ins_pipe(ialu_reg);
>> 13795: %}
>> 13796:
>
> In `test_may_remove_5` the `new_root` function points to the constructor of one of these instructions (what's passed to `peepreplace`). Since the function pointer is unused, these are really not needed. (for `peepreplace` you can just pass `testI_reg` or `testL_reg`, it doesn't really matter).
Potentially we could modify ADLC to make `peepreplace` optional, and then pass `nullptr` to the peepprocedure if there's no `peepreplace`:
diff --git a/src/hotspot/share/adlc/output_c.cpp b/src/hotspot/share/adlc/output_c.cpp
index 5276987eec4..05328453f73 100644
--- a/src/hotspot/share/adlc/output_c.cpp
+++ b/src/hotspot/share/adlc/output_c.cpp
@@ -1469,10 +1469,14 @@ void ArchDesc::definePeephole(FILE *fp, InstructForm *node) {
// End of scope for this peephole's constraints
fprintf(fp, " }\n");
} else {
- const char* replace_inst = nullptr;
- preplace->next_instruction(replace_inst);
- // Generate the target instruction
- fprintf(fp, " auto replacing = [](){ return static_cast<MachNode*>(new %sNode()); };\n", replace_inst);
+ if (preplace != nullptr) {
+ const char* replace_inst = nullptr;
+ preplace->next_instruction(replace_inst);
+ // Generate the target instruction
+ fprintf(fp, " auto replacing = [](){ return static_cast<MachNode*>(new %sNode()); };\n", replace_inst);
+ } else {
+ fprintf(fp, " auto replacing = nullptr;\n");
+ }
// Call the precedure
fprintf(fp, " bool replacement = Peephole::%s(block, block_index, cfg_, ra_, replacing", pprocedure->name());
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1262719608
More information about the hotspot-compiler-dev
mailing list