RFR: 8346664: C2: Optimize mask check with constant offset [v8]

Thu Jan 30 11:57:51 UTC 2025

On Thu, 30 Jan 2025 11:41:35 GMT, Matthias Ernst <duke at openjdk.org> wrote:

>> Fixes [JDK-8346664](https://bugs.openjdk.org/browse/JDK-8346664): extends the optimization of masked sums introduced in #6697 to cover constant values, which currently break the optimization.
>> 
>> Such constant values arise in an expression of the following form, for example from `MemorySegmentImpl#isAlignedForElement`:
>> 
>> 
>> (base + (index + 1) << 8) & 255
>> => MulNode
>> (base + (index << 8 + 256)) & 255
>> => AddNode
>> ((base + index << 8) + 256) & 255
>> 
>> 
>> Currently, `256` is not being recognized as a shifted value. This PR enables further reduction:
>> 
>> 
>> ((base + index << 8) + 256) & 255
>> => MulNode (this PR)
>> (base + index << 8) & 255
>> => MulNode (PR #6697)
>> base & 255 (loop invariant)
>> 
>> 
>> Implementation notes:
>> * I verified that the originating issue "scaled varhandle indexed with i+1"  (https://mail.openjdk.org/pipermail/panama-dev/2024-December/020835.html) is resolved with this PR.
>> * ~in order to stay with the flow of the current implementation, I refrained from solving general (const & mask)==0 cases, but only those where const == _ << shift.~
>> * ~I modified existing test cases adding/subtracting from the index var (which would fail with current C2). Let me know if would like to see separate cases for these.~
>
> Matthias Ernst has updated the pull request incrementally with one additional commit since the last revision:
> 
>   undo assertion and mask shift amount ourselves.
>   we cannot rely on `maskShiftAmount` having run in all cases

src/hotspot/share/opto/mulnode.cpp line 2083:

> 2081:       return -1;
> 2082:     }
> 2083:     return rhs_t->get_con() & (type2aelembytes(bt) * BitsPerByte - 1);

@eme64 The assertion was incorrect: the shift amount is masked as a side-effect in Ideal() => maskShiftAmount(), which happened before the assertion in all the tests I had run. However, it appears that due to loop unrolling(?) we can get into situations where this is not the case. E.g. in TestAndShiftZeroCPP:

        int z = 3;
        int q;
        for (int i = 62; i < 70; ++i) {
            q = y << i;
            for (int j = 0; j < 8; j++) {
                z += i;
            }
            z = q & 0xff;
        }
        return z;

we observe shift nodes that have obviously been unfolded (the shift amount is now `is_con`) but the shift node has not undergone Ideal(). I've removed the assertion and replaced it back with explicit masking here. Passes all hotspot tier1 and gtests for me, except for aforementioned `TestEquivalentInvariants`.

I could have noticed this earlier and leaves me rather embarrassed about yesterday's explosion on my part.
Since I have no insight into the vectorization work, I'll defer to you on how to continue with this.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1935486006