Integrated: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions

Jatin Bhateja jbhateja at openjdk.java.net
Sat Oct 10 06:32:12 UTC 2020


On Mon, 7 Sep 2020 14:28:18 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> Summary:
> 
> 1)  New AVX3 optimized stubs for both conjoint and disjoint arraycopy.
> 2)  Special instruction sequence blocks for copy sizes b/w 32-192 bytes.
> 3)  Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop
> copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4)  Both small copy block
> and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold.
> 5)  For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6)  In case user
> sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy
> sizes above 4096 bytes.  JMH Results:
>   System     :  CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
>   Micros     :  test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java
>   Baseline   :  [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]()
>   WithOpt  :  [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]()

This pull request has now been integrated.

Changeset: 4b5ac3ab
Author:    Jatin Bhateja <jbhateja at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/4b5ac3ab
Stats:     1517 lines in 11 files changed: 1419 ins; 69 del; 29 mod

8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions

Reviewed-by: neliasso, kvn

-------------

PR: https://git.openjdk.java.net/jdk/pull/61


More information about the core-libs-dev mailing list