RFR: 8281453: New optimization: convert "c-(~x)" into "x+(c+1)" and "~(c-x)" into "x+(-c-1)" [v7]
Vladimir Kozlov
kvn at openjdk.java.net
Wed Apr 13 18:00:20 UTC 2022
On Wed, 13 Apr 2022 16:56:40 GMT, Zhiqiang Zang <duke at openjdk.java.net> wrote:
>> Similar to `(~x)+c` -> `(c-1)-x` and `~(x+c)` -> `(-c-1)-x` in #6858, we can also introduce similar optimizations for subtraction, `c-(~x)` -> `x+(c+1)` and `~(c-x)` -> `x+(-c-1)`.
>>
>> The results of the microbenchmark are as follows:
>>
>> Baseline:
>> Benchmark Mode Cnt Score Error Units
>> SubIdealCMinusNotX.baselineInt avgt 60 0.504 ± 0.011 ns/op
>> SubIdealCMinusNotX.baselineLong avgt 60 0.484 ± 0.004 ns/op
>> SubIdealCMinusNotX.testInt1 avgt 60 0.779 ± 0.004 ns/op
>> SubIdealCMinusNotX.testInt2 avgt 60 0.896 ± 0.004 ns/op
>> SubIdealCMinusNotX.testLong1 avgt 60 0.722 ± 0.004 ns/op
>> SubIdealCMinusNotX.testLong2 avgt 60 0.720 ± 0.005 ns/op
>>
>> Patch:
>> Benchmark Mode Cnt Score Error Units
>> SubIdealCMinusNotX.baselineInt avgt 60 0.487 ± 0.009 ns/op
>> SubIdealCMinusNotX.baselineLong avgt 60 0.486 ± 0.009 ns/op
>> SubIdealCMinusNotX.testInt1 avgt 60 0.372 ± 0.010 ns/op
>> SubIdealCMinusNotX.testInt2 avgt 60 0.365 ± 0.003 ns/op
>> SubIdealCMinusNotX.testLong1 avgt 60 0.369 ± 0.004 ns/op
>> SubIdealCMinusNotX.testLong2 avgt 60 0.399 ± 0.016 ns/op
>
> Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits:
>
> - merge master.
> - clean.
> - merge tests into XXXINodeIdealizationTests
> - clean.
> - Merge branch 'master'.
> - convert ~x into -1-x when ~x is part of Add and Sub.
> - include bug id.
> - include a microbenmark.
> - Convert c-(~x) into x+(c+1) in SubNode and convert ~(c-x) into x+(-c-1) in XorNode.
Optimization you proposed does not match RFE description and title.
You do only: `~x` or (x ^ (-1))` -> `(-1 - x)`
As result this should be Xor nodes ideal transformation. I don't even think you need such transformation if `rhs` and `lhs` are not constants because I assume `XOR` and `SUB` hw instructions have the same latency.
I suggest you to redo performance testing after you merged #7795 changes.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7376
More information about the hotspot-compiler-dev
mailing list