RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs
Emanuel Peter
epeter at openjdk.org
Wed Dec 6 09:52:43 UTC 2023
On Wed, 15 Nov 2023 03:14:22 GMT, Fei Gao <fgao at openjdk.org> wrote:
>>> @fg1417 thanks for the help! I ran tier1-6 with `-XX:+AlignVector -XX:+IgnoreUnrecognizedVMOptions -XX:+VerifyAlignVector`. Though `VerifyAlignVector` only has an effect if there is a matching rule in the `ad` files. And I ran quite a few repetitions of `compiler/loopopts/superword/TestAlignVectorFuzzer.java`.
>>
>> Hi @eme64, we have [ `tst`](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/TST--immediate---Test-bits--immediate---an-alias-of-ANDS--immediate--?lang=en) on aarch64.
>>
>> instruct verify_vector_alignment(iRegP r, immL_positive_bitmaskI mask, rFlagsReg cr) %{
>> match(Set r (VerifyVectorAlignment r mask));
>> effect(KILL cr);
>> format %{ "verify_vector_alignment $r $mask \t! verify alignment" %}
>> ins_encode %{
>> Label Lskip;
>> // check if masked bits of r are zero
>> __ tst($r$$Register, $mask$$constant);
>> __ br(Assembler::EQ, Lskip);
>> __ stop("verify_vector_alignment found a misaligned vector memory access");
>> __ bind(Lskip);
>> %}
>> ins_pipe( pipe_slow );
>> %}
>>
>>
>> I tested tier1-tier3 with `-XX:+AlignVector -XX:+IgnoreUnrecognizedVMOptions -XX:+VerifyAlignVector` on aarch64(neon and 128-bit sve) platforms, and several repetitions of `compiler/loopopts/superword/TestAlignVectorFuzzer.java`. No new failures found. Hope it works for you. Thanks.
>
>> @fg1417 thanks for the snippet! Where do I add it? I don't know how the `ad` and `m4` work. Is there some script that converts them?
>
> Hi @eme64 , you can add it [here](https://github.com/openjdk/jdk/blob/d9a89c59daa40fdc8da620940d5c518a9f18bc7b/src/hotspot/cpu/aarch64/aarch64.ad#L16394). Generally, we add **vector** operations to `src/hotspot/cpu/aarch64/aarch64_vector_ad.m4`, and redirect the output of the command: `m4 src/hotspot/cpu/aarch64/aarch64_vector_ad.m4` to `src/hotspot/cpu/aarch64/aarch64_vector.ad`. And other instructions are often added to `src/hotspot/cpu/aarch64/aarch64.ad`. Thanks!
@fg1417 I rewrote the proof a bit, after some of your suggestions. I wanted to make the decomposition `pre_iter = pre_iter_C_const + pre_iter_C_invar + pre_iter_C_init` more explicit and bring it up earlier.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14785#issuecomment-1842538942
More information about the hotspot-compiler-dev
mailing list