RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v5]

Tue May 7 14:00:08 UTC 2024

On Tue, 7 May 2024 11:11:05 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash.
>> 
>> The proposed fix aims to always enter the main loop(s) with an aligned address:
>>  - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base.
>>  - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison.
>> 
>> Testing:
>>  - [x] tier1 (+CCP)
>>  - [x] tier1 (-CCP)
>>  - [x] tier2 (+CCP)
>>  - [x] tier2 (-CCP)
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   @xmas92 review

> > This still looks too complicated for such a simple thing. Why not read the whole array, fully aligned, until the final word (at start+length*elementSize-wordSize) which is possibly unaligned? That would work regardless of alignment. Check the lengths agree first, and the first whole read may or may not include the length field.
> 
> I disagree. This means we would access the length 2x unconditionally (at least in the -CCP/Lilliput path), and need to generate extra code (in the +CCP path for the length check, in the -CCP path to also increase the cnt2 length counter). I suspect that reading lengths 2x could add up pretty quick when comparing short arrays (which is pretty common, e.g. strings).
> 
> What we _could_ do instead is unconditionally compare the lengths in the loop. This means that in the +CCP path, we would also have to compare the compressed Klass*, but I think this should be ok. I don't think that 8-byte loads are any slower than 4-byte loads, and we're actually saving the extra instructions ahead of the loop. And we would not have 2 different paths for -CCP vs +CCP.
> 
> What do you think?

I pushed a change that unifies and simplifies the implementation as described above. If that's not what we want, I would revert it back.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18948#issuecomment-2098474942