RFR: 8346664: C2: Optimize mask check with constant offset [v8]
Emanuel Peter
epeter at openjdk.org
Thu Jan 30 12:35:48 UTC 2025
On Thu, 30 Jan 2025 11:41:35 GMT, Matthias Ernst <duke at openjdk.org> wrote:
>> Fixes [JDK-8346664](https://bugs.openjdk.org/browse/JDK-8346664): extends the optimization of masked sums introduced in #6697 to cover constant values, which currently break the optimization.
>>
>> Such constant values arise in an expression of the following form, for example from `MemorySegmentImpl#isAlignedForElement`:
>>
>>
>> (base + (index + 1) << 8) & 255
>> => MulNode
>> (base + (index << 8 + 256)) & 255
>> => AddNode
>> ((base + index << 8) + 256) & 255
>>
>>
>> Currently, `256` is not being recognized as a shifted value. This PR enables further reduction:
>>
>>
>> ((base + index << 8) + 256) & 255
>> => MulNode (this PR)
>> (base + index << 8) & 255
>> => MulNode (PR #6697)
>> base & 255 (loop invariant)
>>
>>
>> Implementation notes:
>> * I verified that the originating issue "scaled varhandle indexed with i+1" (https://mail.openjdk.org/pipermail/panama-dev/2024-December/020835.html) is resolved with this PR.
>> * ~in order to stay with the flow of the current implementation, I refrained from solving general (const & mask)==0 cases, but only those where const == _ << shift.~
>> * ~I modified existing test cases adding/subtracting from the index var (which would fail with current C2). Let me know if would like to see separate cases for these.~
>
> Matthias Ernst has updated the pull request incrementally with one additional commit since the last revision:
>
> undo assertion and mask shift amount ourselves.
> we cannot rely on `maskShiftAmount` having run in all cases
About the vectorization tests.
--------------
`TestEquivalentInvariants`:
Here we can adjust the `@IR` rules. You can see other IR rules in the file for examples.
`testMemorySegmentIInvarL3d` fails because it expects no vector nodes in the IR, hence the `= 0`.
There are other tests that have rules like these, i.e. with `> 0`:
@Test
@IR(counts = {IRNode.LOAD_VECTOR_I, "> 0",
IRNode.ADD_VI, "> 0",
IRNode.STORE_VECTOR, "> 0"},
applyIfPlatform = {"64-bit", "true"},
applyIf = {"AlignVector", "false"},
applyIfCPUFeatureOr = {"sse4.1", "true", "asimd", "true"})
Why don't you try tweaking the IR rules until they pass on your machine.
------------------
About `compiler/vectorization/TestPopulateIndex.java`.
I think we need to consider a generalization of `PopulateIndex`. Currently, it only allows the pattern:
`i+0, i+1, i+2, i+3, ...`.
This allows us to do this in the backend, see e.g. `src/hotspot/cpu/x86/x86.ad`:
instruct VectorPopulateIndex(vec dst, rRegI src1, immI_1 src2, vec vtmp) %{
match(Set dst (PopulateIndex src1 src2));
effect(TEMP dst, TEMP vtmp);
format %{ "vector_populate_index $dst $src1 $src2\t! using $vtmp as TEMP" %}
ins_encode %{
assert($src2$$constant == 1, "required");
int vlen_in_bytes = Matcher::vector_length_in_bytes(this);
int vlen_enc = vector_length_encoding(this);
BasicType elem_bt = Matcher::vector_element_basic_type(this);
__ vpbroadcast(elem_bt, $vtmp$$XMMRegister, $src1$$Register, vlen_enc);
__ load_iota_indices($dst$$XMMRegister, vlen_in_bytes, elem_bt);
__ vpadd(elem_bt, $dst$$XMMRegister, $dst$$XMMRegister, $vtmp$$XMMRegister, vlen_enc);
%}
ins_pipe( pipe_slow );
%}
We take the `i`, broadcast it to every vector lane, and then add the `iota_indices` to it: `i,i,i,i` + `0,1,2,3`.
We could just introduce some constants vector, and add this instead of the `iota_indices`.
That would allow us to generalize it to any `i+c0, i+c1, i+c2, ...`.
I've wanted to have vector constants for other reasons anyway.
TLDR:
- I'm ok with disabling the IR rules in `TestPopulateIndex.java` for `exprWithIndex1`.
- But create another IR test with `i | 7` instead of `i & 7`, which should still vectorize and pass with IR rules, as it does not trigger your optimization.
- We will file an RFE for the generalization of `PopulateIndex` using vector constants instead of `iota_indices`. In that RFE we can mention this failed test as one example of what should work with the generalization.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22856#issuecomment-2624385212
More information about the hotspot-compiler-dev
mailing list