RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

Wed Nov 6 17:39:31 UTC 2024

On Mon, 14 Oct 2024 15:04:54 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

> For the record I think in this PR we could simply match the IR patterns in the ad file, since (from my understanding) the patterns we are matching could be supported there. We should do platform-specific lowering in a separate patch because it is pretty nuanced, and we could potentially move it to the new system afterwards.

Hi @jaskarth , Bigger pattern matching is sensitive to [IR level node sharing](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/matcher.cpp#L1724), thus it may not be full proof for above 4 patterns. Current patch takes care of this limitation. 

> @jatin-bhateja That is machine-independent lowering, we are talking about machine-dependent lowering to which `MacroLogicV` transformation belongs. You can have `phaselowering_x86` and not have to add another method to `Matcher` as well as add default implementations to various architecture files. You can reuse `MulVL` node for that but I believe these transformations should be done as late as possible.

Hi @merykitty, I see some scope of refactoring and carving out a separate target specific lowering pass going forward, I have brough this up in past too. Existing optimizations are in line with current infrastructure and guards target specific optimizations with target specific match_rule_supported checks e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L2898. As @jaskarth suggests we can pick this up going forward.

> BTW, from the last conversation I had started working on PhaseLowering myself, you can see my work so far on my branch: https://github.com/jaskarth/jdk/tree/phase-lowering. I think I can publish an RFE in the coming two or three days (there were some optimizations and cleanup I was prototyping, I will remove them before sending a PR.) Do you think we should continue with my branch or do you want to approach the problem from a different way? Just want to check again to make sure we don't end up re-doing the same work :)

Hi @jaskarth ,
Please add PhaseLowering skeleton code only and then we can add applicable lowering transforms in seperate patches e.g . I volenteer to move x86 side lowering transforms like MacroLogic Optimization along with this doubleword multiplication pass.

We need to carefully take such decisions keeping in the view the code duplication aspects, so only very specific IR transforms should be lowered,  common transforms should still be part of shared code. 

Let me know if you have any concerns.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411884206
PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2422981643