RFR: 8281518: New optimization: convert "(x|y)-(x^y)" into "x&y" [v2]

Wed Feb 9 16:24:11 UTC 2022

On Wed, 9 Feb 2022 16:04:48 GMT, Zhiqiang Zang <duke at openjdk.java.net> wrote:

>> Convert `(x|y)-(x^y)` into `x&y`, in `SubINode::Ideal` and `SubLNode::Ideal`.
>> 
>> The results of the microbenchmark are as follows:
>> 
>> Baseline:                                                                                                                                         
>> Benchmark                                Mode  Cnt  Score   Error  Units
>> SubIdeal_XOrY_Minus_XXorY_.baselineInt   avgt   60  0.481 ± 0.003  ns/op
>> SubIdeal_XOrY_Minus_XXorY_.baselineLong  avgt   60  0.482 ± 0.004  ns/op
>> SubIdeal_XOrY_Minus_XXorY_.testInt       avgt   60  0.901 ± 0.007  ns/op
>> SubIdeal_XOrY_Minus_XXorY_.testLong      avgt   60  0.894 ± 0.004  ns/op
>> 
>> Patch:
>> Benchmark                                Mode  Cnt  Score   Error  Units
>> SubIdeal_XOrY_Minus_XXorY_.baselineInt   avgt   60  0.480 ± 0.003  ns/op
>> SubIdeal_XOrY_Minus_XXorY_.baselineLong  avgt   60  0.483 ± 0.005  ns/op
>> SubIdeal_XOrY_Minus_XXorY_.testInt       avgt   60  0.600 ± 0.004  ns/op
>> SubIdeal_XOrY_Minus_XXorY_.testLong      avgt   60  0.602 ± 0.004  ns/op
>
> Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   include bug id.

I am not clear whether there is a justification for pushing this change. We are in danger of heading down the garden path looking for optimization fairies.

The above transformation adds extra case handling overhead to the AD matcher when processing a Subtract node which slows down compilation to a small degree for a relatively common case (most apps use subtraction). On the credit side it may generate a small speed up in generated code when the pattern is matched, the saving also depending on not just on seeing this pattern but also on how often the resulting generated code gets executed. So, we have a trade-off.

For any app there are probably going to be a lot of times where the compiler matches subtract nodes. There are probably going to be very few cases where this pattern will turn up -- even if you include cases where it happens through recursive reduction -- and even less where the resulting generated code gets executed many times. At some point we need to trade off the compiler overhead for all applications against the potential gains for some applications. The micro-benchmark only addresses one side of that trade-off.

I'd really like to see a better justification for including this patch and the related transformations suggested by @merykitty before proceeding.

n.b. the fact that gcc and clang do this is not really a good argument. In Java the trade-off is one runtime cost against another which is not the case for those compilers.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7395