RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

Mon Oct 14 17:50:14 UTC 2024

On Mon, 14 Oct 2024 15:04:54 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

> For the record I think in this PR we could simply match the IR patterns in the ad file, since (from my understanding) the patterns we are matching could be supported there. We should do platform-specific lowering in a separate patch because it is pretty nuanced, and we could potentially move it to the new system afterwards.

Hi @jaskarth , Bigger pattern matching is sensitive to [IR level node sharing](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/matcher.cpp#L1724), thus it may not full proof for above 4 patterns. Current patch takes care of this limitation. 

> @jatin-bhateja That is machine-independent lowering, we are talking about machine-dependent lowering to which `MacroLogicV` transformation belongs. You can have `phaselowering_x86` and not have to add another method to `Matcher` as well as add default implementations to various architecture files. You can reuse `MulVL` node for that but I believe these transformations should be done as late as possible.

Hi @merykitty, I see some scope of refactoring and carving out a separate target specific lowering pass going forward, I have brough this up in past too. Existing optimizations are in line with current infrastructure and guards target specific optimizations with target specific match_rule_supported checks e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L2898. As @jaskarth suggests we can pick this up going forward.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411884206