RFR: 8290322: Optimize Vector.rearrange over byte vectors for AVX512BW targets. [v5]

Tue Aug 23 16:59:43 UTC 2022

On Sat, 20 Aug 2022 13:42:01 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> Currently re-arrange over 512bit bytevector is optimized for targets supporting AVX512_VBMI feature, this patch generates efficient JIT sequence to handle it for AVX512BW targets.  Following performance results with newly added benchmark shows
>> significant speedup.
>> 
>> System:  Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (CascadeLake 28C 2S)
>> 
>> 
>> Baseline:
>> =========
>> Benchmark                                     (size)   Mode  Cnt      Score   Error   Units
>> RearrangeBytesBenchmark.testRearrangeBytes16     512  thrpt    2  16350.330          ops/ms
>> RearrangeBytesBenchmark.testRearrangeBytes32     512  thrpt    2  15991.346          ops/ms
>> RearrangeBytesBenchmark.testRearrangeBytes64     512  thrpt    2     34.423          ops/ms
>> RearrangeBytesBenchmark.testRearrangeBytes8      512  thrpt    2  10873.348          ops/ms
>> 
>> 
>> With-opt:
>> =========
>> Benchmark                                     (size)   Mode  Cnt      Score   Error   Units
>> RearrangeBytesBenchmark.testRearrangeBytes16     512  thrpt    2  16062.624          ops/ms
>> RearrangeBytesBenchmark.testRearrangeBytes32     512  thrpt    2  16028.494          ops/ms
>> RearrangeBytesBenchmark.testRearrangeBytes64     512  thrpt    2   8741.901          ops/ms
>> RearrangeBytesBenchmark.testRearrangeBytes8      512  thrpt    2  10983.226          ops/ms
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8290322: Further optimization based on review comments.

FTR, the latest version passed tier1-4 clean.

-------------

PR: https://git.openjdk.org/jdk/pull/9498