RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4]

Fri Oct 18 06:22:38 UTC 2024

On Fri, 18 Oct 2024 06:10:54 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> @iwanowww IMO there are 2 ways to view this:
>> 
>> - You can see a `MulVL` nodes with `_mult_lower_double_word` being an entirely different kind of nodes which do a different thing (a.k.a throw away the upper bits and only multiply the lower bits), in this case it is a machine-dependent IR node hiding behind the opcode of `MulVL` and changing the inputs of it is not worrying because the node does not care about that anyway, its semantics is predetermined already.
>> - Or you can see `_mult_lower_double_word` being an annotation that adds information to `MulVL`, which means it is still a `MulVL` but annotated with information saying that all upper bits of the operands are 0. I think this is Jatin's point of view right now. The issue here would be to keep the annotation sane when the node inputs may be changed.
>
>> @merykitty I was under an erroneous impression that `MulVL::Ideal()` folds operands of particular shapes into `MulVL::_mult_lower_double_word == true`. Now I see it's not the case. Indeed, what `MulVL::Ideal()` does is it caches the info about operand shapes in `MulVL::_mult_lower_double_word` which introduces unnecessary redundancy. I doubt it is possible for IR to diverge so much (through a sequence of equivalent transformations) that the bit gets out of sync (unless there's a bug in compiler or a paradoxical situation in effectively dead code occurs).
> 
> Hi @iwanowww , @merykitty , Thanks for your inputs!!
> 
> I still feel idealization is the right place to execute this pattern detection, we just need to re-wire the effective inputs bypassing doubleword clearing logic to newly annotated MulVL node and allow clearing IR to sweepout during successive passes, moving it to final graph reshaping just before instruction selection will prevent dead IR cleanups.

@jatin-bhateja I think you can do it at the same place as `Compile::optimize_logic_cones`, we do perform IGVN there. Unless you think this information is needed early in the compiling process, currently I see it is used during matching only, which makes it unnecessary to repeatedly checking it in `Node::Ideal`

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421519087