RFR: 8303238: Create generalizations for existing LShift ideal transforms

Claes Redestad redestad at openjdk.org
Mon Mar 6 11:12:12 UTC 2023


On Thu, 23 Feb 2023 20:28:31 GMT, Jasmine K. <duke at openjdk.org> wrote:

> Hello,
> I would like to generalize two ideal transforms for bitwise shifts. Left shift nodes perform the transformations `(x >> C1) << C2 => x & (-1 << C2)` and `((x >> C1) & Y) << C2 => x & (Y << C2)`, but only when the case where `C1 == C2`. However, it is possible to use both of these rules to improve cases where the constants aren't equal, by removing one of the shifts and replacing it with a bitwise and. This transformation is profitable because typically more bitwise ands can be dispatched per cycle than bit shifts. In addition, the strength reduction from a shift to a bitwise and can allow more profitable transformations to occur. These patterns are found throughout the JDK, mainly around strings and OW2 ASM. I've attached some profiling results from my (Zen 2) machine below:
> 
>                                                  Baseline                           Patch              Improvement
> Benchmark                            Mode  Cnt    Score     Error  Units      Score    Error  Units
> LShiftNodeIdealize.testRgbaToAbgr    avgt   15    63.287 ±  1.770  ns/op  /  54.199 ±  1.408  ns/op     + 14.36%
> LShiftNodeIdealize.testShiftAndInt   avgt   15   874.564 ± 15.334  ns/op  / 538.408 ± 11.768  ns/op     + 38.44%
> LShiftNodeIdealize.testShiftAndLong  avgt   15  1017.466 ± 29.010  ns/op  / 701.356 ± 18.258  ns/op     + 31.07%
> LShiftNodeIdealize.testShiftInt      avgt   15   663.865 ± 14.226  ns/op  / 533.588 ±  9.949  ns/op     + 19.63%
> LShiftNodeIdealize.testShiftInt2     avgt   15   658.976 ± 32.856  ns/op  / 649.871 ± 10.598  ns/op     +  1.38%
> LShiftNodeIdealize.testShiftLong     avgt   15   815.540 ± 14.721  ns/op  / 689.270 ± 14.028  ns/op     + 15.48%
> LShiftNodeIdealize.testShiftLong2    avgt   15   817.936 ± 23.573  ns/op  / 810.185 ± 14.983  ns/op     +  0.95%
> 
> 
> In addition, in the process of making this PR I've found a missing ideal transform for `RShiftLNode`, so right shifts of large numbers (such as `x >> 65`) are not properly folded down, like how they are `RShiftINode` and `URShiftLNode`. I'll address this in a future RFR.
> 
> Testing: GHA, tier1 local, and performance testing
> 
> Thanks,
> Jasmine K

Very nice overall!

Some superficial comments inline.

src/hotspot/share/opto/mulnode.cpp line 850:

> 848:   }
> 849: 
> 850:   // Check for "(x >> C1) << C2" which just masks off low bits

The "which just masks off the low bits" comments should move to the C1 == C2 special case. Same below and for `LShiftLNode`.

test/micro/org/openjdk/bench/vm/compiler/LShiftNodeIdealize.java line 100:

> 98:     public static class BenchState {
> 99:         int[] ints;
> 100:         Random random = new Random();

A hard-coded or parameterized seed is preferred for microbenchmarking to reduce noise from different data distributions in back-to-back runs.

-------------

Changes requested by redestad (Reviewer).

PR: https://git.openjdk.org/jdk/pull/12734


More information about the hotspot-compiler-dev mailing list