RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

Wed Nov 6 17:39:33 UTC 2024

On Tue, 15 Oct 2024 00:28:25 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> MulVL (VectorCastI2X src1) (VectorCastI2X src2

It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, thus we may not be able to neglect partial products of upper doublewords while performing 64x64 bit multiplication. Existing patterns guarantees clearing of upper double words thereby result computation only depends on lower doubleword multiplication.  

> Personally, I think this optimization is not essential, so we should proceed with introducing lowering first, then add this transformation to that phase, instead of trying to integrate this transformation then refactor it into phase lowering, which seems like a net extra step.

I think we should not block inflight patches in anticipation of new refactoring. We can always tune it later. 

> I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion?
> 
> About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :)

It will be good to float an RFP with some use-cases upfront before development. As @jaskarth pointed out some vectorization improvements.

> IMO until C2 type system starts to track bitwise constant information ([JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436) et al), there are not enough benefits to rely on IGVN here. So far, all the discussed patterns are simple enough for matcher to handle them without too much tweaking.

Hi @iwanowww , 
I have implemented additional pattern you suggested.
In addition re-wiring pattern inputs to MulVL IR to avoid emitting upper doubleword clearing logic in applicable scenarios. 

Hi @jaskarth , @merykitty ,
As discussed, waiting on PhaseLowering skeleton to move some part of this patch to x86 specific lowering pass.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2420384086
PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2423716135