RFR: 8279258: Auto-vectorization enhancement for two-dimensional array operations [v3]

Thu Dec 30 23:25:46 UTC 2021

> Hi all,
> 
> Happy Christmas Day!
> 
> We have observed that C2 fails to auto-vectorize two-dimensional array operations in our machine learning programs.
> And we have made an reproducer in the JBS.
> 
> Now let's discuss the reproducer.
> The auto-vectorization fails due to `cl->slp_max_unroll() == 0` [1], which means the previous slp analysis never passed.
> 
> As for our example, C2 had tried its first slp analysis with `future_unroll_cnt=4` [2].
> But unfortunately, it failed due to the loop IR is too complicated [3] like the following.
> 
> SuperWord::transform_loop: loop too complicated, cl_exit->in(0) != lpt->_head
> cl_exit 823 823  CountedLoopEnd  ===  738  822  [[ 907  682 ]] [lt] P=0.999999, C=-1.000000 !orig=[680]
> cl_exit->in(0) 738 738  IfTrue  ===  735  [[ 823 ]] #1 !orig=[442] !jvms: DoubleArray2::test @ bci:17 (line 10)
> lpt->_head 1267 1267  CountedLoop  ===  1267  1224  682  [[ 1267  1278  1283  1284  1288  1254  1282 ]] inner stride: 2 main of N1267 !orig=[824],[748],[687] !jvms: DoubleArray2::test @ bci:30 (line 11)
>     Loop: N1267/N682  counted [int,int),+2 (65 iters)  main rc  has_sfpt rce
> RangeCheck       Loop: N1267/N682  counted [int,int),+2 (65 iters)  main rc  has_sfpt rce
> Unroll 4         Loop: N1267/N682  counted [int,int),+2 (65 iters)  main rc  has_sfpt rce
> Loop: N0/N0  has_sfpt
>   Loop: N493/N463  limit_check profile_predicated predicated counted [0,int),+1 (65 iters)  sfpts={ 453 }
>     Loop: N946/N966  counted [0,int),+1 (4 iters)  pre has_sfpt
>     Loop: N1483/N682  counted [int,int),+4 (65 iters)  main rc  has_sfpt
>     Loop: N857/N877  counted [int,int),+1 (4 iters)  post has_sfpt
> PredicatesOff
> 
> 
> Then, C2 unrolled the loop with `unroll-factor=4` and also did some other opts, which actually simplified the loop IR representation.
> 
> And then, comes the next round of loop unrolling analysis, in which C2 would check if `future_unroll_cnt=8` [2] is OK for unrolling.
> C2 rejected `future_unroll_cnt=8` for this example and returned false immediately [4] without doing a second slp analysis, leaving `cl->slp_max_unroll() == 0`.
> But if we re-do the slp analysis with `future_unroll_cnt=4` before returning false, it would pass.
> 
> So the key idea is:
> 
>   slp analysis may fail due to the loop IR is too complicated especially during the early stage of loop unrolling analysis.
>   But after several rounds of loop unrolling and other optimizations, it's possible that the loop IR becomes simple enough to pass the slp analysis.
>   So C2 can try one more slp analysis instead of returning false immediately here [4].
> 
> 
> We have observed up to 1.7x performance improvement by our micro benchmarks.
> 
> ![image](https://user-images.githubusercontent.com/19923746/147344527-b4d9c0ae-c0d4-4cac-b17a-48474648b21a.png)
> 
> Testing:
>   - tier1 ~ tier3 on Linux/x64, no regression.
> 
> Thanks.
> Best regards,
> Jie
> 
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L129
> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908
> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L137
> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L910

Jie Fu has updated the pull request incrementally with one additional commit since the last revision:

  Remove redundant UseSuperWord check

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6933/files
  - new: https://git.openjdk.java.net/jdk/pull/6933/files/1a9b4c84..3af74828

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6933&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6933&range=01-02

  Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6933.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6933/head:pull/6933

PR: https://git.openjdk.java.net/jdk/pull/6933