RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58]
Vladimir Kozlov
kvn at openjdk.org
Thu Jan 4 16:47:35 UTC 2024
On Thu, 4 Jan 2024 08:25:25 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> I don't understand this comment.
>> The `LoadVector` and `StoreVector` have both a `MemNode::Address` input, which I think it the memory address.
>> The address itself usually consists of `AddP` nodes, which do the (base, index, offset) computation. These nodes later can be folded into the load/store itself, or be computed with a `lea`.
>> I simply take the address value, check it for alignment and pass it on to the load/store.
>>
>> Take this example:
>>
>> public class Test {
>> static int RANGE = 1024*64;
>>
>> public static void main(String[] strArr) {
>> int a[] = new int[RANGE];
>> test0(a);
>> }
>>
>> static void test0(int[] a) {
>> for (int i = 0; i < RANGE; i++) {
>> a[i]++;
>> }
>> }
>> }
>>
>>
>> `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java`
>>
>> This looks like the main loop:
>>
>> ;; B22: # out( B22 B23 ) <- in( B21 B22 ) Loop( B22-B22 inner post of N743) Freq: 4.49988
>> 0x00007f83c8bb2f68: lea 0x10(%rbp,%rbx,4),%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0}
>> ; - Test::test0 at 12 (line 11)
>> 0x00007f83c8bb2f6d: mov %r10,%r8
>> 0x00007f83c8bb2f70: test $0x7,%r8b
>> 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a
>> 0x00007f83c8bb2f76: movabs $0x7f83d77e2fc8,%rdi ; {external_word}
>> 0x00007f83c8bb2f80: and $0xfffffffffffffff0,%rsp
>> 0x00007f83c8bb2f84: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)}
>> 0x00007f83c8bb2f89: hlt
>> 0x00007f83c8bb2f8a: test $0x7,%r10b
>> 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4
>> 0x00007f83c8bb2f90: movabs $0x7f83d77e2fc8,%rdi ; {external_word}
>> 0x00007f83c8bb2f9a: and $0xfffffffffffffff0,%rsp
>> 0x00007f83c8bb2f9e: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)}
>> 0x00007f83c8bb2fa3: hlt
>> 0x00007f83c8bb2fa4: vpaddd (%r10),%zmm5,%zmm0
>> 0x00007f83c8bb2faa: vmovdqu32 %zmm0,(%r8) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> ; - Test::test0 at 15 (line 11)
>> 0x00007f83c8bb2fb0: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>> ...
>
> And without `-XX:-VerifyAlignVector`
>
>
> ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner post of N743) Freq: 4.49988
> 0x00007ff22cbb2924: vpaddd 0x10(%rbx,%r13,4),%zmm0,%zmm1
> 0x00007ff22cbb292f: vmovdqu32 %zmm1,0x10(%rbx,%r13,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> ; - Test::test0 at 15 (line 11)
> 0x00007ff22cbb293a: add $0x10,%r13d ;*iinc {reexecute=0 rethrow=0 return_oop=0}
> ; - Test::test0 at 16 (line 10)
> 0x00007ff22cbb293e: cmp %r10d,%r13d
> 0x00007ff22cbb2941: jl 0x00007ff22cbb2924 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
> ; - Test::test0 at 6 (line 10)
Can you show assembler code for simple load and store instructions (move data from one array to another)?
My concern is that LoadV and StoreV are defined only with `memory` input:
instruct loadV(vec dst, memory mem) %{
match(Set dst (LoadVector mem));
I would assume it will be embedded memory only. But C2 may be smart enough to generate `lea` if it sees not AddP node.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442017423
More information about the hotspot-compiler-dev
mailing list