RFR: 8346664: C2: Optimize mask check with constant offset
Matthias Ernst
duke at openjdk.org
Fri Jan 17 14:53:02 UTC 2025
On Wed, 8 Jan 2025 07:41:39 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Fixes [JDK-8346664](https://bugs.openjdk.org/browse/JDK-8346664): extends the optimization of masked sums introduced in #6697 to cover constant values, which currently break the optimization.
>>
>> Such constant values arise in an expression of the following form, for example from `MemorySegmentImpl#isAlignedForElement`:
>>
>>
>> (base + (index + 1) << 8) & 255
>> => MulNode
>> (base + (index << 8 + 256)) & 255
>> => AddNode
>> ((base + index << 8) + 256) & 255
>>
>>
>> Currently, `256` is not being recognized as a shifted value. This PR enables further reduction:
>>
>>
>> ((base + index << 8) + 256) & 255
>> => MulNode (this PR)
>> (base + index << 8) & 255
>> => MulNode (PR #6697)
>> base & 255 (loop invariant)
>>
>>
>> Implementation notes:
>> * I verified that the originating issue "scaled varhandle indexed with i+1" (https://mail.openjdk.org/pipermail/panama-dev/2024-December/020835.html) is resolved with this PR.
>> * ~in order to stay with the flow of the current implementation, I refrained from solving general (const & mask)==0 cases, but only those where const == _ << shift.~
>> * ~I modified existing test cases adding/subtracting from the index var (which would fail with current C2). Let me know if would like to see separate cases for these.~
>
> src/hotspot/share/opto/mulnode.cpp line 674:
>
>> 672:
>> 673: // Is expr a neutral element wrt addition under mask?
>> 674: static bool AndIL_is_zero_element(const PhaseGVN* phase, const Node* expr, const Node* mask, BasicType bt);
>
> I would prefer a more expressive function name over a comment here. The comment is a little confusing to me too.
>
> The old name at least talked about shift and mask - is that not relevant any more?
>
> Or you just decide to name it `AndIL_is_zero`, and drop out the comment. Because who knows someone might add other things that check for zero in that method, and then your comment would be out-dated (but probably people would forget to adjust it).
Renamed to is_zero_element_under_mask. "zero element" for me drives down that it's neither
* checking for expr == 0
* nor checking for expr & mask == 0
but really (X + expr) & mask == X & mask for all X.
There is no requirement for shift node, e.g, we recognize is_zero_under_mask(192, 7). However, the constants that are recognized here are "shifts in spirit" (e.g. expanded from `(i + 24) << 3`). If you can think of a good term for this that doesn't suggest there's an actual "shift node" we could try and incorporate that.
Since this is a forward declaration, the elaborate comment is below. Went and dropped the short one here.
> src/hotspot/share/opto/mulnode.cpp line 677:
>
>> 675:
>> 676: const Type* AndINode::Value(PhaseGVN* phase) const {
>> 677: // patterns similar to (v << 2) & 3
>
> Should this comment be updated now for a more general pattern?
I think it's better to drop this and refer the reader to the definition.
> src/hotspot/share/opto/mulnode.cpp line 2063:
>
>> 2061:
>> 2062: // Returns a lower bound on the number of trailing zeros in expr.
>> 2063: static jint AndIL_min_trailing_zeros(const PhaseGVN* phase, const Node* expr, BasicType bt) {
>
> Is this method restricted to use in AndIL? Because it looks like it is doing something more generic: trying to figure out a lower bound on the trailing zeros of an expression.
>
> If that is the case: Why not put it in `Node::get_trailing_zeros_lower_bound(phase, bt)`, so it can be used elsewhere too?
I would argue that while this might be incidentally reusable outside of the scope of "And" nodes, as long as there is no actual demand to reuse this, I would rather not add it to the rather prominent Node class to avoid api bloat.
Iff the notion of "is known to be a multiple of a certain power of two" is really of general interest, I would expect it to become part of `TypeInteger`.
> src/hotspot/share/opto/mulnode.cpp line 2113:
>
>> 2111:
>> 2112: jint zeros = AndIL_min_trailing_zeros(phase, expr, bt);
>> 2113: return zeros > 0 && ((((jlong)1) << zeros) > mask_t->hi_as_long() && mask_t->lo_as_long() >= 0);
>
> This line indicates that the `mask` could be a variable. You should make sure to add some tests for that in your patterns. You can create a variable in a specific range like this `Math.min(5, Math.max(1, x))`, should get you `x` clamped into the region `1..5`.
Added a variant for adding consts using the same pattern as the other "NonConstMask" tests.
> test/hotspot/jtreg/compiler/c2/irTests/TestShiftAndMask.java line 221:
>
>> 219: }
>> 220: }
>> 221:
>
> Nice to see that you have some examples here!
>
> I think it would be great to have some more though. The divil hides in the details. In the edge cases usually.
>
> You currently have patterns like this:
> `(j + ((i + c1) << c2)) & c3;`
> What if you generate the constants `c1, c2, c3` randomly:
> `public static final int C1 = random.nextInt()` (or some other random distribution that makes more sense).
> Then the compiler will see them as constants (because final), and attempt constant folding.
>
> You can then do result verification: You create a method copy that you restrict to the interpreter, and the other copy can be compiled. Then you test the method with all sorts of random inputs for `i, j`, and verify the results of the two methods (compiled vs interpreted).
>
> Maybe you can add some more patterns as well, just to have a better test coverage.
>
> Does that make sense?
I believe I understand the intent, and I've now randomized all constant masks / shifts / consts in this file. But just to make sure: IIUC the tests are only compiled once per invocation, there is no way I can tell the framework to "C2 compile this x times with different random constants". I.e. I can `make test` this a hundred times locally, but I cannot create large coverage via the framework, right?
Also not quite sure I understand the verification proposal. How would that be different from the current comparisons `if (result != expected simplified form)` ? Now if the framework supported an automatic comparison of compiled vs interpreted invocation, that would be nice.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1907156759
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1907171418
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1907196108
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1907148203
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1907213304
More information about the hotspot-compiler-dev
mailing list