RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

Thu Oct 17 19:43:22 UTC 2024

On Tue, 15 Oct 2024 00:28:25 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> MulVL (VectorCastI2X src1) (VectorCastI2X src2

It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, thus we may not be able to neglect partial products of upper doublewords while performing 64x64 bit multiplication. Existing patterns guarantees clearing of upper double words thereby result computation only depends on lower doubleword multiplication.  

> Personally, I think this optimization is not essential, so we should proceed with introducing lowering first, then add this transformation to that phase, instead of trying to integrate this transformation then refactor it into phase lowering, which seems like a net extra step.

I think we should not block inflight patches in anticipation of new refactoring. We can always tune it later. 

> I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion?
> 
> About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :)

It will be good to float an RFP with some use-cases upfront before development. As @jaskarth pointed out some vectorization improvements.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2420384086