RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

Thu Oct 17 19:43:24 UTC 2024

On Mon, 14 Oct 2024 21:26:41 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>> 
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137
>>  - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction
>
> src/hotspot/share/opto/vectornode.cpp line 2124:
> 
>> 2122:     // MulL ( And  SRC1,  0xFFFFFFFF) (URShift SRC2 , 32)
>> 2123:     if ((is_lower_double_word_and_mask_op(in(1)) ||
>> 2124:          is_lower_double_word_and_mask_op(in(1)) ||
> 
> `is_lower_double_word_and_mask_op(in(1)) || is_lower_double_word_and_mask_op(in(1))` is redundant, right? Shouldn't you only need it once? Same for the other 3 calls, which are similarly repeated.

Ah, these harmless cunning typos :-), but we should not rely on c-compiler's short circuiting.

> test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 41:
> 
>> 39:  */
>> 40: 
>> 41: public class VectorMultiplyOpt {
> 
> Could it be possible to also do IR verification in this test? It would be good to check that we don't generate `AndVL` or `URShiftVL` with this transform.

We do need those nodes to chop off the upper double words of quadword lanes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805324915
PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805324796