RFR: 8279258: Auto-vectorization enhancement for two-dimensional array operations [v2]

Nils Eliasson neliasso at openjdk.java.net
Thu Dec 30 09:43:22 UTC 2021


On Mon, 27 Dec 2021 14:41:58 GMT, Jie Fu <jiefu at openjdk.org> wrote:

>> Hi all,
>> 
>> Happy Christmas Day!
>> 
>> We have observed that C2 fails to auto-vectorize two-dimensional array operations in our machine learning programs.
>> And we have made an reproducer in the JBS.
>> 
>> Now let's discuss the reproducer.
>> The auto-vectorization fails due to `cl->slp_max_unroll() == 0` [1], which means the previous slp analysis never passed.
>> 
>> As for our example, C2 had tried its first slp analysis with `future_unroll_cnt=4` [2].
>> But unfortunately, it failed due to the loop IR is too complicated [3] like the following.
>> 
>> SuperWord::transform_loop: loop too complicated, cl_exit->in(0) != lpt->_head
>> cl_exit 823 823  CountedLoopEnd  ===  738  822  [[ 907  682 ]] [lt] P=0.999999, C=-1.000000 !orig=[680]
>> cl_exit->in(0) 738 738  IfTrue  ===  735  [[ 823 ]] #1 !orig=[442] !jvms: DoubleArray2::test @ bci:17 (line 10)
>> lpt->_head 1267 1267  CountedLoop  ===  1267  1224  682  [[ 1267  1278  1283  1284  1288  1254  1282 ]] inner stride: 2 main of N1267 !orig=[824],[748],[687] !jvms: DoubleArray2::test @ bci:30 (line 11)
>>     Loop: N1267/N682  counted [int,int),+2 (65 iters)  main rc  has_sfpt rce
>> RangeCheck       Loop: N1267/N682  counted [int,int),+2 (65 iters)  main rc  has_sfpt rce
>> Unroll 4         Loop: N1267/N682  counted [int,int),+2 (65 iters)  main rc  has_sfpt rce
>> Loop: N0/N0  has_sfpt
>>   Loop: N493/N463  limit_check profile_predicated predicated counted [0,int),+1 (65 iters)  sfpts={ 453 }
>>     Loop: N946/N966  counted [0,int),+1 (4 iters)  pre has_sfpt
>>     Loop: N1483/N682  counted [int,int),+4 (65 iters)  main rc  has_sfpt
>>     Loop: N857/N877  counted [int,int),+1 (4 iters)  post has_sfpt
>> PredicatesOff
>> 
>> 
>> Then, C2 unrolled the loop with `unroll-factor=4` and also did some other opts, which actually simplified the loop IR representation.
>> 
>> And then, comes the next round of loop unrolling analysis, in which C2 would check if `future_unroll_cnt=8` [2] is OK for unrolling.
>> C2 rejected `future_unroll_cnt=8` for this example and returned false immediately [4] without doing a second slp analysis, leaving `cl->slp_max_unroll() == 0`.
>> But if we re-do the slp analysis with `future_unroll_cnt=4` before returning false, it would pass.
>> 
>> So the key idea is:
>> 
>>   slp analysis may fail due to the loop IR is too complicated especially during the early stage of loop unrolling analysis.
>>   But after several rounds of loop unrolling and other optimizations, it's possible that the loop IR becomes simple enough to pass the slp analysis.
>>   So C2 can try one more slp analysis instead of returning false immediately here [4].
>> 
>> 
>> We have observed up to 1.7x performance improvement by our micro benchmarks.
>> 
>> ![image](https://user-images.githubusercontent.com/19923746/147344527-b4d9c0ae-c0d4-4cac-b17a-48474648b21a.png)
>> 
>> Testing:
>>   - tier1 ~ tier3 on Linux/x64, no regression.
>> 
>> Thanks.
>> Best regards,
>> Jie
>> 
>> 
>> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L129
>> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908
>> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L137
>> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L910
>
> Jie Fu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Address review comments
>  - Merge branch 'master' into JDK-8279258
>  - 8279258: Auto-vectorization enhancement for two-dimensional array operations

Yes. Looks good!

-------------

Marked as reviewed by neliasso (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6933


More information about the hotspot-compiler-dev mailing list