Array addition and array sum Panama benchmarks
Roland Westrelin
rwestrel at redhat.com
Fri Mar 22 10:57:29 UTC 2024
> One solution would be for c2 to transform the long memory load + long to
> to double move into double memory load (an Ideal transformation). The
> code would then vectorize with no change to the vectorizer
> required. That seems fairly straightforward as a change.
Actually, there's already support for that. Vladimir added it with
8253734 but the transformation is delayed until after loop optimizations
so vectorization doesn't benefit from it. This:
diff --git a/src/hotspot/share/opto/movenode.cpp b/src/hotspot/share/opto/movenode.cpp
index bfa30f02ada..75c647193a8 100644
--- a/src/hotspot/share/opto/movenode.cpp
+++ b/src/hotspot/share/opto/movenode.cpp
@@ -374,7 +374,7 @@ Node* MoveNode::Ideal(PhaseGVN* phase, bool can_reshape) {
if (ld != nullptr && (ld->outcnt() == 1)) { // replace only
const Type* rt = bottom_type();
if (ld->has_reinterpret_variant(rt)) {
- if (phase->C->post_loop_opts_phase()) {
+ if (phase->C->post_loop_opts_phase() || UseNewCode) {
return ld->convert_to_reinterpret_load(*phase, rt);
} else {
phase->C->record_for_post_loop_opts_igvn(this); // attempt the transformation once loop opts are over
and then running with -XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode
shows the code vectorizes now for scalarSegmentArray and performs much
better.
I don't see a discussion of the reason for delaying the transformation
in the PR for 8253734. Vladimir, can you comment on that?
Roland.
More information about the panama-dev
mailing list