RFR: 8318650: Optimized subword gather for x86 targets. [v10]

Tue Jan 16 07:34:24 UTC 2024

On Tue, 16 Jan 2024 06:08:35 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1634:
>> 
>>> 1632:                                                     Register offset, XMMRegister offset_vec, XMMRegister idx_vec,
>>> 1633:                                                     XMMRegister xtmp1, XMMRegister xtmp2, XMMRegister xtmp3, KRegister mask,
>>> 1634:                                                     KRegister gmask, int vlen_enc, int vlen) {
>> 
>> Would you mind giving a quick summary of what the input registers are and what exactly this method does?
>> Why do we need to call `vgather_subword_avx3` so many times (`lane_count_subwords`)?
>
> Method gathers sub-words from gather indices using integral gather instructions, because of the lane size mismatch b/w int and sub-words algorithm makes multiple calls to vgather_subword_avx3.

As a reviewer, I feel like I have to reverse engineer this now. I would really appreciate if there was a proper comment at the beginning, that tells me what is happening here. Maybe use some equation at the beginning, of what we want to acheive in the abstract, then explain why that does not work directly, and why you have to break it down into a loop, and then state the equation again in the loop form.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1453020617