RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5]

Andrew Dinn adinn at openjdk.org
Thu Aug 22 15:21:06 UTC 2024


On Thu, 22 Aug 2024 14:42:20 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 100:
>> 
>>> 98: 
>>> 99:   assert(is_power_of_2(unroll_factor), "can't use this value to calculate the jump target PC");
>>> 100:   orr(tmp2, cnt, 0x1fff ^ (unroll_factor - 1));
>> 
>> This is rather cryptic, especially the slightly arbitrary choice of 0x1fff -- any all 1s value greater than (loop_factor - 1) would do. I think a comment as to what is being done here might help maintainers understand what is happening here.
>> 
>> // At this point cnt holds (r - l) where r is the number of remaining elements,
>> // l is loop_count and 0 <= r < l. The constant operand to the orr 'rounds' this
>> // negative value into range [-u, -1] where u = unroll_factor, by clearing bits
>> // below bit k = log2(unroll_factor) and setting any higher bits that might be clear.
>> // The subtract shifted by 3 offsets past ((u - r) % u) load + madd insns i.e. it only
>> // executes r % u load + madds. Iteration eats up the remainder, u elements at a time.
>
> Thank you for providing a detailed comment!
> 
> I can't remember why I needed the `0x1fff` literal. Probably this is an artifact left from WIP code. I believe this is not required anymore and the `orr` can be simplified to `orr(tmp2, cnt, -unroll_factor)`.
> 
> Given that, I'd suggest to shorten the comment to:
> 
> ```c++
>   // At this point cnt holds (r - l) where r is the number of remaining elements, l is loop_count                                                                                                                                                                                                                           
>   // and 0 <= r < l. The orr performs (r - l) % u where u = unroll_factor.  The subtract shifted by                                                                                                                                                                                                                         
>   // 3 offsets past |(r - l) % u| load + madd insns i.e. it only executes r % u load + madds.                                                                                                                                                                                                                               
>   // Iteration eats up the remainder, u elements at a time.

Yes, just fixing the constant helps a lot. However, the comment is still worth having.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1727313087


More information about the hotspot-dev mailing list