RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58]
Emanuel Peter
epeter at openjdk.org
Thu Jan 4 07:00:51 UTC 2024
On Wed, 3 Jan 2024 19:50:49 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> src/hotspot/share/opto/compile.cpp line 3713:
>>
>>> 3711: // to ObjectAlignmentInBytes. Hence, even if multiple arrays are accessed in
>>> 3712: // a loop we can expect at least the following alignment:
>>> 3713: jlong guaranteed_alignment = MIN2(vector_width, (jlong)ObjectAlignmentInBytes);
>>
>> This is more relaxed check than the actual alignment required. As I understand it is because it checks only base address of array and not actually memory address to which vector instruction is accessed (which is (base,index,offset)).
>> It is useful but does not guarantee correct alignment of vector access instructions.
>>
>> Consider using `lea` instruction on x86 to load memory address into register and check it.
>
> May be hack only `loadV` and `storeV` instructions in .ad file to use `lea` and do the check.
I don't understand this comment.
The `LoadVector` and `StoreVector` have both a `MemNode::Address` input, which I think it the memory address.
The address itself usually consists of `AddP` nodes, which do the (base, index, offset) computation. These nodes later can be folded into the load/store itself, or be computed with a `lea`.
I simply take the address value, check it for alignment and pass it on to the load/store.
Take this example:
public class Test {
static int RANGE = 1024*64;
public static void main(String[] strArr) {
int a[] = new int[RANGE];
test0(a);
}
static void test0(int[] a) {
for (int i = 0; i < RANGE; i++) {
a[i]++;
}
}
}
`./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java`
This looks like the main loop:
;; B22: # out( B22 B23 ) <- in( B21 B22 ) Loop( B22-B22 inner post of N743) Freq: 4.49988
0x00007f83c8bb2f68: lea 0x10(%rbp,%rbx,4),%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0}
; - Test::test0 at 12 (line 11)
0x00007f83c8bb2f6d: mov %r10,%r8
0x00007f83c8bb2f70: test $0x7,%r8b
0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a
0x00007f83c8bb2f76: movabs $0x7f83d77e2fc8,%rdi ; {external_word}
0x00007f83c8bb2f80: and $0xfffffffffffffff0,%rsp
0x00007f83c8bb2f84: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)}
0x00007f83c8bb2f89: hlt
0x00007f83c8bb2f8a: test $0x7,%r10b
0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4
0x00007f83c8bb2f90: movabs $0x7f83d77e2fc8,%rdi ; {external_word}
0x00007f83c8bb2f9a: and $0xfffffffffffffff0,%rsp
0x00007f83c8bb2f9e: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)}
0x00007f83c8bb2fa3: hlt
0x00007f83c8bb2fa4: vpaddd (%r10),%zmm5,%zmm0
0x00007f83c8bb2faa: vmovdqu32 %zmm0,(%r8) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
; - Test::test0 at 15 (line 11)
0x00007f83c8bb2fb0: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - Test::test0 at 16 (line 10)
0x00007f83c8bb2fb3: cmp %r11d,%ebx
0x00007f83c8bb2fb6: jl 0x00007f83c8bb2f68 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - Test::test0 at 6 (line 10)
What I see: `lea` computes address, stores to register `r10`. Move value to `r8`, do alignment check `test $0x7,%r8b`, which checks for 8 byte alignment. We do the same check again with `r10b`, since we use the same address for load and store. And then we directly load/store with those register values:
vpaddd (%r10),%zmm5,%zmm0
vmovdqu32 %zmm0,(%r8)
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1441398603
More information about the hotspot-compiler-dev
mailing list