RFR: 8303238: Create generalizations for existing LShift ideal transforms

Jasmine K. duke at openjdk.org
Mon Feb 27 15:27:00 UTC 2023


Hello,
I would like to generalize two ideal transforms for bitwise shifts. Left shift nodes perform the transformations `(x >> C1) << C2 => x & (-1 << C2)` and `((x >> C1) & Y) << C2 => x & (Y << C2)`, but only when the case where `C1 == C2`. However, it is possible to use both of these rules to improve cases where the constants aren't equal, by removing one of the shifts and replacing it with a bitwise and. This transformation is profitable because typically more bitwise ands can be dispatched per cycle than bit shifts. In addition, the strength reduction from a shift to a bitwise and can allow more profitable transformations to occur. These patterns are found throughout the JDK, mainly around strings and OW2 ASM. I've attached some profiling results from my (Zen 2) machine below:

                                                 Baseline                           Patch              Improvement
Benchmark                            Mode  Cnt    Score     Error  Units      Score    Error  Units
LShiftNodeIdealize.testRgbaToAbgr    avgt   15    63.287 ±  1.770  ns/op  /  54.199 ±  1.408  ns/op     + 14.36%
LShiftNodeIdealize.testShiftAndInt   avgt   15   874.564 ± 15.334  ns/op  / 538.408 ± 11.768  ns/op     + 38.44%
LShiftNodeIdealize.testShiftAndLong  avgt   15  1017.466 ± 29.010  ns/op  / 701.356 ± 18.258  ns/op     + 31.07%
LShiftNodeIdealize.testShiftInt      avgt   15   663.865 ± 14.226  ns/op  / 533.588 ±  9.949  ns/op     + 19.63%
LShiftNodeIdealize.testShiftInt2     avgt   15   658.976 ± 32.856  ns/op  / 649.871 ± 10.598  ns/op     +  1.38%
LShiftNodeIdealize.testShiftLong     avgt   15   815.540 ± 14.721  ns/op  / 689.270 ± 14.028  ns/op     + 15.48%
LShiftNodeIdealize.testShiftLong2    avgt   15   817.936 ± 23.573  ns/op  / 810.185 ± 14.983  ns/op     +  0.95%


In addition, in the process of making this PR I've found a missing ideal transform for `RShiftLNode`, so right shifts of large numbers (such as `x >> 65`) are not properly folded down, like how they are `RShiftINode` and `URShiftLNode`. I'll address this in a future RFR.

I will submit a bug on the web bug portal and will update the PR title once that goes through.

Testing: GHA, tier1 local, and performance testing

Thanks,
Jasmine K

-------------

Commit messages:
 - Create generalizations for existing LShift ideal transforms

Changes: https://git.openjdk.org/jdk/pull/12734/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12734&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8303238
  Stats: 423 lines in 4 files changed: 395 ins; 0 del; 28 mod
  Patch: https://git.openjdk.org/jdk/pull/12734.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12734/head:pull/12734

PR: https://git.openjdk.org/jdk/pull/12734


More information about the hotspot-compiler-dev mailing list