RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set
Tobias Hotz
duke at openjdk.org
Mon Jul 24 13:44:44 UTC 2023
On Fri, 14 Jul 2023 16:58:58 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:
>> @ichttt Yes, I don't think tracking all the nodes that use flags is elegant, you can get the use from the test node, then iterate the operands of the use to find the `cmpOp` operand which you can know which flag it is using. For most nodes, this will be enough, and for `add`/`sub` the transformation can be applied if `cmpOp->ccode() == Assembler::zero || cmpOp->ccode() == Assembler::notZero`. I think try to do this for `add/sub` would be beneficial since they are more common and they can be macro-fuse with `jcc`.
>
>> I don't think tracking all the nodes that use flags is elegant, you can get the use from the test node, then iterate the operands of the use to find the cmpOp operand which you can know which flag it is using.
>
> Yeah, that sounds better. I think in that case we only need 2 peephole rules, one for `testI_reg` and one for `testL_reg`. In the peep procedure, the rule of the prior instruction can be mapped to the flags that it sets (e.g. a simple `switch` from rule -> mask of flags). Then, if the prior instruction sets all the flags needed by the user of the `test`, we can eliminate the `test` instruction.
>
> In the future, ADL could be extended to allow setting which flags are set or used by an instruction, and then we don't need the ad-hoc mapping of rules -> flags.
I've updated the PR to follow the approach suggested by @JornVernee . The AD files now contains the information regarding which flag from the EFLAGS register is actually set or cleared.
The only downside of this I currently see is that it uses a lot of space from the flags register, which now only has space for three more flags in the future. Not all the flags that are set in the AD file are currently used, but I thoght it might be cleaner to list all of them instead of only those needed for the test removal.
I've tested this locally and it seems to work as intended. We also now have less overhead in the peephole phase as there are only two peepholes remaing (one of testI_reg and one for testL_reg) instead of 6.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1647940525
More information about the hotspot-compiler-dev
mailing list