RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v3]

Emanuel Peter epeter at openjdk.org
Mon Feb 24 14:32:57 UTC 2025


On Mon, 24 Feb 2025 12:52:42 GMT, Roland Westrelin <roland at openjdk.org> wrote:

> > @rwestrel Do you want me to find examples for the pre-loop disappearing? I suppose I can find some easily by adding an assert in SuperWord, where we bail out, as I showed above.
> 
> Yes, if not too much work.

Ok, let's add this:

diff --git a/src/hotspot/share/opto/vectorization.cpp b/src/hotspot/share/opto/vectorization.cpp
index e607a1065dd..290ee249a42 100644
--- a/src/hotspot/share/opto/vectorization.cpp
+++ b/src/hotspot/share/opto/vectorization.cpp
@@ -98,6 +98,7 @@ VStatus VLoop::check_preconditions_helper() {
     // the pre-loop limit.
     CountedLoopEndNode* pre_end = _cl->find_pre_loop_end();
     if (pre_end == nullptr) {
+      assert(false, "found no pre-loop");
       return VStatus::make_failure(VLoop::FAILURE_PRE_LOOP_LIMIT);
     }
     Node* pre_opaq1 = pre_end->limit();


And run that:

rr /oracle-work/jdk-fork7/build/linux-x64-slowdebug/jdk/bin/java -Xcomp -XX:+TraceLoopOpts -XX:CompileCommand=compileonly,jdk.internal.classfile.impl.StackMapGenerator::processBlock --version

....

PreMainPost      Loop: N7127/N4014  limit_check profile_predicated predicated counted [0,int),+1 (2147483648 iters)  rc  has_sfpt strip_mined
Unroll 2         Loop: N7127/N4014  counted [int,int),+1 (2147483648 iters)  main rc  has_sfpt strip_mined
Loop: N0/N0  has_call has_sfpt
  Loop: N7453/N7460  limit_check profile_predicated predicated counted [0,int),+1 (4 iters)  pre rc  has_sfpt
  Loop: N7126/N7125  sfpts={ 7128 }
    Loop: N7508/N4014  counted [int,int),+2 (2147483648 iters)  main rc  has_sfpt strip_mined
  Loop: N7409/N7416  counted [int,int),+1 (4 iters)  post rc  has_sfpt
Parallel IV: 7728   Loop: N7453/N7460  limit_check profile_predicated predicated counted [0,int),+1 (4 iters)  pre has_sfpt
Parallel IV: 7725     Loop: N7508/N4014  counted [int,int),+2 (2147483648 iters)  main has_sfpt strip_mined
Parallel IV: 7718   Loop: N7409/N7416  counted [int,int),+1 (4 iters)  post has_sfpt
Loop: N0/N0  has_call has_sfpt
  Loop: N7453/N7460  limit_check profile_predicated predicated counted [0,int),+1 (4 iters)  pre has_sfpt
  Loop: N7126/N7125  sfpts={ 7128 }
    Loop: N7508/N4014  counted [int,int),+2 (2147483648 iters)  main has_sfpt strip_mined
  Loop: N7409/N7416  counted [int,int),+1 (4 iters)  post has_sfpt
RangeCheck       Loop: N7508/N4014  counted [int,int),+2 (2147483648 iters)  main has_sfpt rce strip_mined
Unroll 4         Loop: N7508/N4014  limit_check counted [int,int),+2 (2147483648 iters)  main has_sfpt rce strip_mined
Loop: N0/N0  has_call has_sfpt
  Loop: N7453/N7460  limit_check profile_predicated predicated counted [0,int),+1 (4 iters)  pre rc  has_sfpt
  Loop: N7126/N7125  limit_check sfpts={ 7128 }
    Loop: N8146/N4014  limit_check counted [int,int),+4 (2147483648 iters)  main has_sfpt strip_mined
  Loop: N7409/N7416  counted [int,int),+1 (4 iters)  post rc  has_sfpt

...
#  Internal Error (/oracle-work/jdk-fork7/open/src/hotspot/share/opto/vectorization.cpp:101), pid=1381339, tid=1381348
#  assert(false) failed: found no pre-loop


The pre-loop node is not dead actually. The issue is with the main-loop in `CountedLoopNode::is_canonical_loop_entry`.

We skip through some predicates, but then we cannot find the ZeroTripGuard, rather I'm seeing this:

(rr) p ctrl->dump_bfs(2,0,"#cd")
dist dump
---------------------------------------------
   2   974  ConI  === 0  [[ ... ]]  #int:1
   2  8060  IfTrue  === 8056  [[ 8073 ]] #1
   1  8073  If  === 8060 974  [[ 8074 8077 ]] #Last Value Assertion Predicate  P=0.999999, C=-1.000000
   0  8077  IfTrue  === 8073  [[ 8103 ]] #1


The pre-loop is further up though:

(rr) p this->dump_bfs(26,0,"#c")
dist dump
---------------------------------------------
  26  7453  CountedLoop  === 7453 4015 7460  [[ 7452 7453 7454 7455 ]] inner stride: 1 pre of N7127 !orig=[7127],[7118],[2645] !jvms: StackMapGenerator::processBlock @ bci:2677 (line 671)
  25  7455  If  === 7453 7441  [[ 7456 7464 ]] P=0.000001, C=-1.000000 !orig=[2686] !jvms: StackMapGenerator$Frame::popStack @ bci:5 (line 1001) StackMapGenerator::processBlock @ bci:2681 (line 671)
  24  7456  IfFalse  === 7455  [[ 7448 7457 ]] #0 !orig=[2631],[2628] !jvms: StackMapGenerator$Frame::popStack @ bci:5 (line 1001) StackMapGenerator::processBlock @ bci:2681 (line 671)
  23  7457  RangeCheck  === 7456 7446  [[ 7458 7467 ]] P=0.999999, C=-1.000000 !orig=[1189] !jvms: StackMapGenerator$Frame::popStack @ bci:33 (line 1002) StackMapGenerator::processBlock @ bci:2681 (line 671)
  22  7458  IfTrue  === 7457  [[ 7459 ]] #1 !orig=[777],385 !jvms: StackMapGenerator$Frame::popStack @ bci:33 (line 1002) StackMapGenerator::processBlock @ bci:2681 (line 671)
  21  7459  CountedLoopEnd  === 7458 7443  [[ 7460 7482 ]] [lt] P=0.900000, C=-1.000000 !orig=7122,[5398] !jvms: StackMapGenerator::processBlock @ bci:2674 (line 670)
  20  7482  IfFalse  === 7459  [[ 7486 ]] #0
  19  7486  If  === 7482 7485  [[ 7461 7487 ]] P=0.999999, C=-1.000000
  18  7487  IfTrue  === 7486  [[ 7977 ]] #1
  17  7977  If  === 7487 974  [[ 7978 7981 ]] #Init Value Assertion Predicate  P=0.999999, C=-1.000000
  16  7981  IfTrue  === 7977  [[ 7994 ]] #1
  15  7994  If  === 7981 974  [[ 7995 7998 ]] #Last Value Assertion Predicate  P=0.999999, C=-1.000000
  14  7998  IfTrue  === 7994  [[ 8118 ]] #1
  13  8118  If  === 7998 8117  [[ 8119 8122 ]] #Last Value Assertion Predicate  P=0.999999, C=-1.000000
  12  8122  IfTrue  === 8118  [[ 8007 ]] #1
  11  8007  If  === 8122 8006  [[ 8008 8011 ]] #Init Value Assertion Predicate  P=0.999999, C=-1.000000
  10  8011  IfTrue  === 8007  [[ 8056 ]] #1
   9  8056  If  === 8011 974  [[ 8057 8060 ]] #Init Value Assertion Predicate  P=0.999999, C=-1.000000
   8  8060  IfTrue  === 8056  [[ 8073 ]] #1
   7  8073  If  === 8060 974  [[ 8074 8077 ]] #Last Value Assertion Predicate  P=0.999999, C=-1.000000
   6  8077  IfTrue  === 8073  [[ 8103 ]] #1
   5  8173  IfFalse  === 7122  [[ 7128 7129 ]] #0 !orig=[7524],[7123],[5442] !jvms: StackMapGenerator::processBlock @ bci:2674 (line 670)
   5  8103  If  === 8077 8102  [[ 8104 8107 ]] #Last Value Assertion Predicate  P=0.999999, C=-1.000000
   4  7128  SafePoint  === 8173 1 778 1 1 7129 780 1 1 781 781 782 783 784 1 1 1 785 786  [[ 7124 ]]  SafePoint  !orig=385 !jvms: StackMapGenerator::processBlock @ bci:2688 (line 670)
   4  8107  IfTrue  === 8103  [[ 8086 ]] #1
   3  7124  OuterStripMinedLoopEnd  === 7128 781  [[ 7125 7471 ]] P=0.900000, C=-1.000000
   3  8086  If  === 8107 8085  [[ 8087 8090 ]] #Init Value Assertion Predicate  P=0.999999, C=-1.000000
   2  7122  CountedLoopEnd  === 8146 7121  [[ 8173 4014 ]] [lt] P=0.900000, C=-1.000000 !orig=[5398] !jvms: StackMapGenerator::processBlock @ bci:2674 (line 670)
   2  7125  IfTrue  === 7124  [[ 7126 ]] #1
   2  8090  IfTrue  === 8086  [[ 7126 ]] #1
   1  4014  IfTrue  === 7122  [[ 8146 ]] #1 !jvms: StackMapGenerator::processBlock @ bci:2674 (line 670)
   1  7126  OuterStripMinedLoop  === 7126 8090 7125  [[ 7126 8146 ]] 
   0  8146  CountedLoop  === 8146 7126 4014  [[ 8146 1191 8157 8158 7122 7503 ]] inner stride: 4 main of N8146 strip mined !orig=[7508],[7127],[7118],[2645] !jvms: StackMapGenerator::processBlock @ bci:2677 (line 671)


It looks like we are skipping some predicates, but not enough of them maybe?
In `AssertionPredicates::find_entry` we see:
- `8090  IfTrue  === 8086  [[ 7126 ]] #1`: `is_predicate` returns `true`.
- `8107  IfTrue  === 8103  [[ 8086 ]] #1`: `is_predicate` returns `true`.
- `8077  IfTrue  === 8073  [[ 8103 ]] #1`: `is_predicate` returns `false`. The reason is that the assertion predicate Opaque nodes have already disappeared.

I talked with @chhagedorn and he says that there are some "dying" initialized assertion predicates from unrolling that can be in the way. They would be cleaned out by IGVN later, and then we can see through. But at this point they are in the way and we cannot see through and find the ZeroTripGuard, the predicate iterator is not good enough yet. But @chhagedorn is working on that. https://bugs.openjdk.org/browse/JDK-8350579

The implication is that the ZeroTripGuard can be temporarily not be found, and so we cannot even find the pre-loop, and also not the multiversion-if. So I cannot really add an assert now. And who knows, there may be other blocking reasons on top of that.

@rwestrel Does that make sense? What do you think we should do?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2678602660


More information about the hotspot-dev mailing list