RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

Quan Anh Mai qamai at openjdk.org
Wed Nov 6 17:39:37 UTC 2024


On Fri, 18 Oct 2024 05:16:04 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>>> I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward.
>> 
>> The issue is that a node is not immutable. This puts a burden on every place to keep the annotation sane when doing transformations, which is easily missed when there are a lot of kinds of `Node`s out there. That's why I think it is most suitable to be done only right before matching. `Node::Ideal` is invoked in a really generous manner so I would prefer not to add analysis to it that can be done more efficiently somewhere else. Additionally, if you have a separate IR node for this operation, you can do some more beneficial transformations such as `MulVL (AndV x max_juint) (AndV y max_juint)` into `MulVLowIToL x y`.
>> 
>> My suggestions are based on this PR as a standalone, so they may not be optimal when looking at a wider perspective, in case you think this approach would fit more nicely into a larger landscape of your planned enhancements please let us know. Thanks for your patience.
>
>> The issue is that a node is not immutable.
> 
> I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that  a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead.

@iwanowww IMO there are 2 ways to view this:

- You can see a `MulVL` nodes with `_mult_lower_double_word` being an entirely different kind of nodes which do a different thing (a.k.a throw away the upper bits and only multiply the lower bits), in this case it is a machine-dependent IR node hiding behind the opcode of `MulVL` and changing the inputs of it is not worrying because the node does not care about that anyway, its semantics is predetermined already.
- Or you can see `_mult_lower_double_word` being an annotation that adds information to `MulVL`, which means it is still a `MulVL` but annotated with information saying that all upper bits of the operands are 0. I think this is Jatin's point of view right now. The issue here would be to keep the annotation sane when the node inputs may be changed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421441405


More information about the hotspot-compiler-dev mailing list