[master] RFR: Fix arrays-equals intrinsic on AArch64
Axel Boldt-Christmas
aboldtch at openjdk.org
Wed Apr 24 15:30:42 UTC 2024
On Wed, 24 Apr 2024 14:36:31 GMT, Roman Kennke <rkennke at openjdk.org> wrote:
> The arrays-equals intrinsic on AArch64 assumes that array elements start at 8-byte-aligned boundary. There are several problems with that:
> - Doing unaligned loads is likely rather slow.
> - I believe it may give wrong results when comparing some junk after the end of the array.
> - We may crash when loading beyond the heap boundary.
>
> The proposed fix is to start the comparison at the array-length field. When the array base is unaligned (that is really 4-byte-aligned), then the array-length is at 8-byte-aligned location. And since we want to compare the lengths anyway, we can just as well use word-sized loads to compare the length and first elements in a single step, and elide the separate cmp+branch for the length.
This seems resonable.
But some initial comments.
It seems like the `UseSimpleArrayEquals` above suffers from the unaligned access issues, it seems to be off by default except on Cortex A73. However it should probably be handled as well.
This was also broken for large arrays which assumes that the pointers are 8 byte aligned. The whole over-aligning in there to 16 bytes is just wrong when these are not aligned.
This seems to be the only use of the large array stub, but it seems like all of these have to be evaluated for correctness, have comments and requirements updated. E.g.
https://github.com/openjdk/lilliput/blob/80abb98d830fa26c8381b09c71857f0276c031b9/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L5253-L5254
src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5363:
> 5361: } else {
> 5362: ldr(tmp3, Address(pre(a1, length_offset)));
> 5363: }
Something like this could avoid the ifs. Also the `const int start_offset = is_8aligned ? base_offset : length_offset;` could be pushed up and shared for the simple implementation, which probably should be fixed as well.
Suggestion:
const int start_offset = is_8aligned ? base_offset : length_offset;
ldr(tmp3, Address(pre(a1, start_offset)));
src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5438:
> 5436: br(NE, DONE);
> 5437: }
> 5438: cbz(cnt1, SAME);
Was this an optimization? The `cbz` would never be taken if `!is_8aligned ` as it implies `cnt1 > 0`
-------------
PR Review: https://git.openjdk.org/lilliput/pull/170#pullrequestreview-2020197726
PR Review Comment: https://git.openjdk.org/lilliput/pull/170#discussion_r1578083395
PR Review Comment: https://git.openjdk.org/lilliput/pull/170#discussion_r1578083521
More information about the lilliput-dev
mailing list