RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

Wed Nov 6 17:39:38 UTC 2024

On Fri, 18 Oct 2024 05:16:04 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>>> I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward.
>> 
>> The issue is that a node is not immutable. This puts a burden on every place to keep the annotation sane when doing transformations, which is easily missed when there are a lot of kinds of `Node`s out there. That's why I think it is most suitable to be done only right before matching. `Node::Ideal` is invoked in a really generous manner so I would prefer not to add analysis to it that can be done more efficiently somewhere else. Additionally, if you have a separate IR node for this operation, you can do some more beneficial transformations such as `MulVL (AndV x max_juint) (AndV y max_juint)` into `MulVLowIToL x y`.
>> 
>> My suggestions are based on this PR as a standalone, so they may not be optimal when looking at a wider perspective, in case you think this approach would fit more nicely into a larger landscape of your planned enhancements please let us know. Thanks for your patience.
>
>> The issue is that a node is not immutable.
> 
> I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that  a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead.

Hi @iwanowww , @merykitty , I am in process of addressing all your concerns.

I still feel idealization is the right place to execute this pattern detection, we just need to re-wire the effective inputs bypassing doubleword clearing logic to newly annotated MulVL node and allow clearing IR to sweepout during successive passes, moving it to final graph reshaping just before instruction selection will prevent dead IR cleanups.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421448784