RFR: 8346664: C2: Optimize mask check with constant offset [v10]

Thu Jan 30 15:16:52 UTC 2025

On Thu, 30 Jan 2025 14:37:17 GMT, Matthias Ernst <duke at openjdk.org> wrote:

>> Fixes [JDK-8346664](https://bugs.openjdk.org/browse/JDK-8346664): extends the optimization of masked sums introduced in #6697 to cover constant values, which currently break the optimization.
>> 
>> Such constant values arise in an expression of the following form, for example from `MemorySegmentImpl#isAlignedForElement`:
>> 
>> 
>> (base + (index + 1) << 8) & 255
>> => MulNode
>> (base + (index << 8 + 256)) & 255
>> => AddNode
>> ((base + index << 8) + 256) & 255
>> 
>> 
>> Currently, `256` is not being recognized as a shifted value. This PR enables further reduction:
>> 
>> 
>> ((base + index << 8) + 256) & 255
>> => MulNode (this PR)
>> (base + index << 8) & 255
>> => MulNode (PR #6697)
>> base & 255 (loop invariant)
>> 
>> 
>> Implementation notes:
>> * I verified that the originating issue "scaled varhandle indexed with i+1"  (https://mail.openjdk.org/pipermail/panama-dev/2024-December/020835.html) is resolved with this PR.
>> * ~in order to stay with the flow of the current implementation, I refrained from solving general (const & mask)==0 cases, but only those where const == _ << shift.~
>> * ~I modified existing test cases adding/subtracting from the index var (which would fail with current C2). Let me know if would like to see separate cases for these.~
>
> Matthias Ernst has updated the pull request incrementally with one additional commit since the last revision:
> 
>   expect vectorization of i|7 instead of i&7
>   i&7 variant expect "= 0".

Just copying my rationale here, so that we don't lose it should the code change and GitHub lose track of the comment:

-----------------------
Ah, this indeed sounds scary. Hmm let me try to remember what I was thinking here... this was rather complicated.

If you are interested to read up more on this (no need), see here:
`src/hotspot/share/opto/mempointer.hpp`
Look for `overflow` and the definition of `SAFE` decompisition, and the `MemPointer Lemma`.

We are doing int accesses with int stride (`getAtIndex` for `int` layout). So if there was a different overflow, we would overflow by a factor of `2^32` indices, which would put us outside the range of an `int[]` size which is maximally about `2^31`.

Note that we feed the method only `int[]`.

If we were to feed in `long[]`, then we could possibly overflow the index, and end up still inside the array, as the jump would be `2^32 * 4` bytes, which is inside close to the maximal capacity of an `long[]`, i.e. `2^31 * 8` bytes.

Why don't you make an exact copy of the method, but instead feed it `long[]`. Then, I think it would not vectorize. This would be helpful to confirm my theory. That would make me feel ok to just remove the comment, or rather move the comment to the `long[]` case ;)

-------------

PR Review: https://git.openjdk.org/jdk/pull/22856#pullrequestreview-2584199855