RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2]

Sun Nov 10 07:43:48 UTC 2024

On Fri, 8 Nov 2024 20:25:23 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> In the latest version you added new Ideal nodes (`MulIVL` and `MulUIVL`). I don't see a compelling reason to do so. IMO matcher functionality is more than enough to cover `VPMULDQ` case. `MulIVL` is equivalent to `MulVL` + `has_int_inputs()` predicate. For `MulUIVL` you additionally do input rewiring (using `forward_masked_input`), but (1) `AndV src (Replicate 0xFFFFFFFF))` operands can be easily detected on matcher side (with an extra AD instruction); and (2) such optimization is limited because it is valid only for `0xFFFFFFFF` case while `has_uint_inputs() == true` for `C <= 0xFFFFFFFF`.
> 
> So, IMO `MulIVL` and `MulUIVL` nodes just add noise in Ideal graph without improving situation during matching.

Hi Vladimir, 
Problem occurs if AndV gets shared, in such case matcher will not be able to absorb the masking pattern.
Specialized IR overrules any such limitations and shields pattern it represents from downstream optimizations.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2466624605