RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v2]
Nils Eliasson
neliasso at openjdk.java.net
Thu Sep 17 13:24:54 UTC 2020
On Thu, 17 Sep 2020 05:16:52 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Summary:
>>
>> 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy.
>> 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes.
>> 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop
>> copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block
>> and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold.
>> 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user
>> sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy
>> sizes above 4096 bytes. JMH Results:
>> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
>> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java
>> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]()
>> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]()
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>
> 8252847: Review comments resolution
I like that you have extracted the avx512 stub code from the rest - that makes it a lot more readable! Overall the new
code feels easy to understand and read.
I found one more minor issue (appears in four places).
My only concern is that it's getting hard to follow under what circumstances avx3 instructions are used:
Could it be the case that different thresholds are needed for when you are using avx3 instructions with 32 or 64 byte
vectors? Are we sure all variants are tested?
Also - have you thought about supporting oop-copies? You only have to call the
BarrierSetAssembler::arraycopy_prologue/epilogue like in the old versions. It's not a requirement for me to approve
this - but an encouragement for a future patch.
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2390:
> 2388: address *entry, const char *name,
> 2389: bool dest_uninitialized = false) {
> 2390: if (VM_Version::supports_avx512vlbw() && false == is_oop && MaxVectorSize >= 32) {
"false == is_oop" => !is_oop
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2501:
> 2499: address generate_disjoint_long_oop_copy(bool aligned, bool is_oop, address *entry,
> 2500: const char *name, bool dest_uninitialized = false) {
> 2501: if (VM_Version::supports_avx512vlbw() && false == is_oop && MaxVectorSize >= 32) {
false == is_oop => !is_oop
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2608:
> 2606: address nooverlap_target, address *entry,
> 2607: const char *name, bool dest_uninitialized = false) {
> 2608: if (VM_Version::supports_avx512vlbw() && false == is_oop && MaxVectorSize >= 32) {
false == is_oop => !is_oop
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2282:
> 2280: address generate_disjoint_int_oop_copy(bool aligned, bool is_oop, address* entry,
> 2281: const char *name, bool dest_uninitialized = false) {
> 2282: if (VM_Version::supports_avx512vlbw() && false == is_oop && MaxVectorSize >= 32) {
false == is_oop => !is_oop
-------------
Changes requested by neliasso (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/61
More information about the hotspot-compiler-dev
mailing list