RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction
Vladimir Ivanov
vlivanov at openjdk.org
Wed Nov 6 17:39:43 UTC 2024
On Fri, 18 Oct 2024 05:46:25 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:
>> You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq
>>
>> VPMULUDQ (VEX.256 Encoded Version)[ ¶](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-)
>> DEST[63:0] := SRC1[31:0] * SRC2[31:0]
>> DEST[127:64] := SRC1[95:64] * SRC2[95:64]
>> DEST[191:128] := SRC1[159:128] * SRC2[159:128]
>> DEST[255:192] := SRC1[223:192] * SRC2[223:192]
>> DEST[MAXVL-1:256] := 0
>
> Got it. Now it makes perfect sense. Thanks for the clarifications!
Actually, it makes detecting the pattern during matching even simpler than I initially thought. Since there's no need to match any non-trivial ideal IR tree, AD instruction can just match a single `MulVL`, but detect operand shapes using a predicate.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805903273
More information about the core-libs-dev
mailing list