RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5]

Thu Aug 22 14:45:06 UTC 2024

On Thu, 22 Aug 2024 13:12:23 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

>> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   cleanup: use a constexpr function for intpow instead of a templated class
>
> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 100:
> 
>> 98: 
>> 99:   assert(is_power_of_2(unroll_factor), "can't use this value to calculate the jump target PC");
>> 100:   orr(tmp2, cnt, 0x1fff ^ (unroll_factor - 1));
> 
> This is rather cryptic, especially the slightly arbitrary choice of 0x1fff -- any all 1s value greater than (loop_factor - 1) would do. I think a comment as to what is being done here might help maintainers understand what is happening here.
> 
> // At this point cnt holds (r - l) where r is the number of remaining elements,
> // l is loop_count and 0 <= r < l. The constant operand to the orr 'rounds' this
> // negative value into range [-u, -1] where u = unroll_factor, by clearing bits
> // below bit k = log2(unroll_factor) and setting any higher bits that might be clear.
> // The subtract shifted by 3 offsets past ((u - r) % u) load + madd insns i.e. it only
> // executes r % u load + madds. Iteration eats up the remainder, u elements at a time.

Thank you for providing a detailed comment!

I can't remember why I needed the `0x1fff` literal. Probably this is an artifact left from WIP code. I believe this is not required anymore and the `orr` can be simplified to `orr(tmp2, cnt, -unroll_factor)`.

Given that, I'd suggest to shorten the comment to:

```c++
  // At this point cnt holds (r - l) where r is the number of remaining elements, l is loop_count                                                                                                                                                                                                                           
  // and 0 <= r < l. The orr performs (r - l) % u where u = unroll_factor.  The subtract shifted by                                                                                                                                                                                                                         
  // 3 offsets past |(r - l) % u| load + madd insns i.e. it only executes r % u load + madds.                                                                                                                                                                                                                               
  // Iteration eats up the remainder, u elements at a time.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1727195330