RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

Wed Nov 6 17:39:29 UTC 2024

On Fri, 11 Oct 2024 16:54:23 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

> I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering`

I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407821557