RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4]

Fri Oct 18 06:22:38 UTC 2024

On Fri, 18 Oct 2024 05:35:27 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>>> The issue is that a node is not immutable.
>> 
>> I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that  a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead.
>
> @iwanowww IMO there are 2 ways to view this:
> 
> - You can see a `MulVL` nodes with `_mult_lower_double_word` being an entirely different kind of nodes which do a different thing (a.k.a throw away the upper bits and only multiply the lower bits), in this case it is a machine-dependent IR node hiding behind the opcode of `MulVL` and changing the inputs of it is not worrying because the node does not care about that anyway, its semantics is predetermined already.
> - Or you can see `_mult_lower_double_word` being an annotation that adds information to `MulVL`, which means it is still a `MulVL` but annotated with information saying that all upper bits of the operands are 0. I think this is Jatin's point of view right now. The issue here would be to keep the annotation sane when the node inputs may be changed.

> @merykitty I was under an erroneous impression that `MulVL::Ideal()` folds operands of particular shapes into `MulVL::_mult_lower_double_word == true`. Now I see it's not the case. Indeed, what `MulVL::Ideal()` does is it caches the info about operand shapes in `MulVL::_mult_lower_double_word` which introduces unnecessary redundancy. I doubt it is possible for IR to diverge so much (through a sequence of equivalent transformations) that the bit gets out of sync (unless there's a bug in compiler or a paradoxical situation in effectively dead code occurs).

Hi @iwanowww , @merykitty , Thanks for your inputs!!

I still feel idealization is the right place to execute this pattern detection, we just need to re-wire the effective inputs bypassing doubleword clearing logic to newly annotated MulVL node and allow clearing IR to sweepout during successive passes, moving it to final graph reshaping just before instruction selection will prevent dead IR cleanups.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421511006