RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v4]
Jorn Vernee
jvernee at openjdk.org
Mon Aug 7 06:50:37 UTC 2023
On Fri, 4 Aug 2023 15:58:02 GMT, Tobias Hotz <duke at openjdk.org> wrote:
>> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction.
>> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value.
>> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe.
>> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed:
>> Results on Intel Core i5-8250U CPU
>> Before this patch:
>>
>> Benchmark Mode Cnt Score Error Units
>> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ± 1.751 ns/op
>> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ± 0.002 ns/op
>> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ± 0.310 ns/op
>> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ± 0.002 ns/op
>> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ± 0.215 ns/op
>> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ± 0.002 ns/op
>> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ± 0.666 ns/op
>> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ± 0.001 ns/op
>> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ± 2.034 ns/op
>> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ± 0.002 ns/op
>> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ± 0.602 ns/op
>> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183 ± 0.001 ns/op
>>
>> After this patch:
>>
>> Benchmark Mode Cnt Score Error Units Change
>> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ± 4.747 ns/op ~29% faster
>> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ± 0.002 ns/op (unchanged)
>> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ± 1.094 ns/op (unchanged)
>> TestRemovalPeephole.bench...
>
> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision:
>
> Exchange and for or in the tests
>
> and will get matched to a test_reg_reg, so it was pointless
Thanks for the updated version. (Sorry that it took me a while to get back to this)
Left a few small inline comments. What testing did you run on this? (I'll also run tier 1-4 in our CI)
src/hotspot/cpu/x86/peephole_x86_64.cpp line 197:
> 195: juint required_flags = 0;
> 196: // Search for the uses of the node and compute which flags are required
> 197: for (DUIterator i = test_to_check->outs(); test_to_check->has_out(i); i++) {
AFAICS `fast_outs` can be used here, since you don't modify the `_opnds`.
src/hotspot/cpu/x86/x86.ad line 1268:
> 1266: Flag_clears_overflow_flag = Node::_last_flag << 10,
> 1267: Flag_clears_sign_flag = Node::_last_flag << 11,
> 1268: _last_flag = Flag_clears_sign_flag
I think adding the flags here is good. If the number of flags becomes a problem, we could instead generate virtual methods on all the nodes to return the flag mask.
src/hotspot/cpu/x86/x86_64.ad line 7679:
> 7677: effect(KILL cr);
> 7678: flag(PD::Flag_sets_overflow_flag, PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_carry_flag, PD::Flag_sets_parity_flag);
> 7679:
Please remove these extra blank lines. On `addI_rReg_mem`, `addI_mem_rReg`, and `blsrL_rReg_rReg`.
src/hotspot/share/adlc/adlparse.cpp line 232:
> 230: else if (!strcmp(ident, "size")) instr->_size = size_parse(instr);
> 231: else if (!strcmp(ident, "effect")) effect_parse(instr);
> 232: else if (!strcmp(ident, "flag")) instr->_flag = flag_parse(instr);
Suggestion:
else if (!strcmp(ident, "size")) instr->_size = size_parse(instr);
else if (!strcmp(ident, "effect")) effect_parse(instr);
else if (!strcmp(ident, "flag")) instr->_flag = flag_parse(instr);
src/hotspot/share/adlc/output_c.cpp line 4019:
> 4017: if (inst->_flag != nullptr) {
> 4018: Flag* node = inst->_flag;
> 4019: const char* prefix = "Node::";
You could potentially make the prefix here `Node::PD::`, then the extra `PD::` could be removed from the .ad file (I don't think it really adds much?).
src/hotspot/share/adlc/output_c.cpp line 4023:
> 4021: do {
> 4022: if (!node_flags_set) {
> 4023: fprintf(fp_cpp, "%s node->add_flag(%s%s", indent, strncmp(node->_name, prefix, strlen(prefix)) != 0 ? prefix : "", node->_name);
This seems to be guarding against a case where the flag is declared with the prefix already in the .ad file. Is this required for something?
(Otherwise I suggest just using `node->_name` here, as it forces the flag declarations in the .ad file to be consistent).
test/hotspot/jtreg/compiler/c2/irTests/TestTestRemovalPeephole.java line 33:
> 31: /*
> 32: * @test
> 33: * @summary Test that patterns leading to Conv2B are correctly expanded.
Summary seems to be incorrect.
-------------
PR Review: https://git.openjdk.org/jdk/pull/14172#pullrequestreview-1564605775
PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285432374
PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285420960
PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285421697
PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285413991
PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285419421
PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285420190
PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285438292
More information about the hotspot-compiler-dev
mailing list