Array addition and array sum Panama benchmarks

Fri Mar 22 10:57:29 UTC 2024

> One solution would be for c2 to transform the long memory load + long to
> to double move into double memory load (an Ideal transformation). The
> code would then vectorize with no change to the vectorizer
> required. That seems fairly straightforward as a change.

Actually, there's already support for that. Vladimir added it with
8253734 but the transformation is delayed until after loop optimizations
so vectorization doesn't benefit from it. This:

diff --git a/src/hotspot/share/opto/movenode.cpp b/src/hotspot/share/opto/movenode.cpp
index bfa30f02ada..75c647193a8 100644
--- a/src/hotspot/share/opto/movenode.cpp
+++ b/src/hotspot/share/opto/movenode.cpp
@@ -374,7 +374,7 @@ Node* MoveNode::Ideal(PhaseGVN* phase, bool can_reshape) {
     if (ld != nullptr && (ld->outcnt() == 1)) { // replace only
       const Type* rt = bottom_type();
       if (ld->has_reinterpret_variant(rt)) {
-        if (phase->C->post_loop_opts_phase()) {
+        if (phase->C->post_loop_opts_phase() || UseNewCode) {
           return ld->convert_to_reinterpret_load(*phase, rt);
         } else {
           phase->C->record_for_post_loop_opts_igvn(this); // attempt the transformation once loop opts are over

and then running with -XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode
shows the code vectorizes now for scalarSegmentArray and performs much
better.

I don't see a discussion of the reason for delaying the transformation
in the PR for 8253734. Vladimir, can you comment on that?

Roland.