Integrated: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions
Jatin Bhateja
jbhateja at openjdk.java.net
Sat Oct 10 06:32:12 UTC 2020
On Mon, 7 Sep 2020 14:28:18 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
> Summary:
>
> 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy.
> 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes.
> 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop
> copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block
> and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold.
> 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user
> sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy
> sizes above 4096 bytes. JMH Results:
> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java
> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]()
> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]()
This pull request has now been integrated.
Changeset: 4b5ac3ab
Author: Jatin Bhateja <jbhateja at openjdk.org>
URL: https://git.openjdk.java.net/jdk/commit/4b5ac3ab
Stats: 1517 lines in 11 files changed: 1419 ins; 69 del; 29 mod
8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions
Reviewed-by: neliasso, kvn
-------------
PR: https://git.openjdk.java.net/jdk/pull/61
More information about the core-libs-dev
mailing list