RFR: 8346664: C2: Optimize mask check with constant offset

Fri Jan 17 14:53:02 UTC 2025

On Sat, 21 Dec 2024 14:08:02 GMT, Matthias Ernst <duke at openjdk.org> wrote:

> Fixes [JDK-8346664](https://bugs.openjdk.org/browse/JDK-8346664): extends the optimization of masked sums introduced in #6697 to cover constant values, which currently break the optimization.
> 
> Such constant values arise in an expression of the following form, for example from `MemorySegmentImpl#isAlignedForElement`:
> 
> 
> (base + (index + 1) << 8) & 255
> => MulNode
> (base + (index << 8 + 256)) & 255
> => AddNode
> ((base + index << 8) + 256) & 255
> 
> 
> Currently, `256` is not being recognized as a shifted value. This PR enables further reduction:
> 
> 
> ((base + index << 8) + 256) & 255
> => MulNode (this PR)
> (base + index << 8) & 255
> => MulNode (PR #6697)
> base & 255 (loop invariant)
> 
> 
> Implementation notes:
> * I verified that the originating issue "scaled varhandle indexed with i+1"  (https://mail.openjdk.org/pipermail/panama-dev/2024-December/020835.html) is resolved with this PR.
> * ~in order to stay with the flow of the current implementation, I refrained from solving general (const & mask)==0 cases, but only those where const == _ << shift.~
> * ~I modified existing test cases adding/subtracting from the index var (which would fail with current C2). Let me know if would like to see separate cases for these.~

src/hotspot/share/opto/mulnode.cpp line 674:

> 672: 
> 673: // Is expr a neutral element wrt addition under mask?
> 674: static bool AndIL_is_zero_element(const PhaseGVN* phase, const Node* expr, const Node* mask, BasicType bt);

I would prefer a more expressive function name over a comment here. The comment is a little confusing to me too.

The old name at least talked about shift and mask - is that not relevant any more?

Or you just decide to name it `AndIL_is_zero`, and drop out the comment. Because who knows someone might add other things that check for zero in that method, and then your comment would be out-dated (but probably people would forget to adjust it).

src/hotspot/share/opto/mulnode.cpp line 677:

> 675: 
> 676: const Type* AndINode::Value(PhaseGVN* phase) const {
> 677:   // patterns similar to (v << 2) & 3

Should this comment be updated now for a more general pattern?

src/hotspot/share/opto/mulnode.cpp line 2063:

> 2061: 
> 2062: // Returns a lower bound on the number of trailing zeros in expr.
> 2063: static jint AndIL_min_trailing_zeros(const PhaseGVN* phase, const Node* expr, BasicType bt) {

Is this method restricted to use in AndIL? Because it looks like it is doing something more generic: trying to figure out a lower bound on the trailing zeros of an expression.

If that is the case: Why not put it in `Node::get_trailing_zeros_lower_bound(phase, bt)`, so it can be used elsewhere too?

src/hotspot/share/opto/mulnode.cpp line 2113:

> 2111: 
> 2112:   jint zeros = AndIL_min_trailing_zeros(phase, expr, bt);
> 2113:   return zeros > 0 && ((((jlong)1) << zeros) > mask_t->hi_as_long() && mask_t->lo_as_long() >= 0);

This line indicates that the `mask` could be a variable. You should make sure to add some tests for that in your patterns. You can create a variable in a specific range like this `Math.min(5, Math.max(1, x))`, should get you `x` clamped into the region `1..5`.

test/hotspot/jtreg/compiler/c2/irTests/TestShiftAndMask.java line 221:

> 219:         }
> 220:     }
> 221: 

Nice to see that you have some examples here!

I think it would be great to have some more though. The divil hides in the details. In the edge cases usually.

You currently have patterns like this:
`(j + ((i + c1) << c2)) & c3;`
What if you generate the constants `c1, c2, c3` randomly:
`public static final int C1 = random.nextInt()` (or some other random distribution that makes more sense).
Then the compiler will see them as constants (because final), and attempt constant folding.

You can then do result verification: You create a method copy that you restrict to the interpreter, and the other copy can be compiled. Then you test the method with all sorts of random inputs for `i, j`, and verify the results of the two methods (compiled vs interpreted).

Maybe you can add some more patterns as well, just to have a better test coverage.

Does that make sense?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1906619159
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1906620853
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1906634027
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1906615575
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1906599399