RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy [v5]

Steve Dohrmann duke at openjdk.org
Tue Nov 21 21:03:25 UTC 2023


On Tue, 21 Nov 2023 01:10:40 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Steve Dohrmann has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits:
>> 
>>  - Merge branch 'master' into memcpy
>>  - Update full name
>>    Previous commit (fcbbc0d7880) added org.openjdk.bench.java.lang.ArrayCopyAlignedLarge benchmark
>>  - - remerge upstream master
>>    - remove ::copy test from XorTest
>>  - Merge branch 'master' into memcpy
>>  - - fix whitespace error
>>  - Merge branch 'master' of https://git.openjdk.org/jdk into memcpy
>>  - - bug fix: only generate / use large copy code if MaxVectorSize == 64
>>  - - fix whitespace issues
>>    - fix xor test foreign impl constructor signature
>>  - - initial commit -- optimize large array cases in StubGenerator::generate_disjoint_copy_avx3_masked
>>      - add src address prefetches
>>      - switch to non-temporal writes
>>      - added modified jmh benchmark based on xor benchmark from Maurizio Cimadamore
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 753:
> 
>> 751:   Label L_pre_main_post_large;
>> 752: 
>> 753:   if (MaxVectorSize == 64) {
> 
> This should be an assert here instead of if check as this method shouldn't be called if MaxVectorSize is < 64.

Thanks, done.

> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 777:
> 
>> 775: 
>> 776:     __ BIND(L_main_pre_loop_large);
>> 777:     __ subq(temp1, loop_size[shift]);  // whay is this here
> 
> Spurious comment.

Thanks, done

> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 797:
> 
>> 795:     __ jcc(Assembler::lessEqual, L_exit_large);
>> 796:     arraycopy_avx3_special_cases_256(xmm1, k2, from, to, temp1, shift,
>> 797:                                  temp4, temp3, L_entry_large, L_exit_large);
> 
> When we come here to arraycopy_avx3_special_cases_256 only up to 256 bytes need to be copied so we don't need to go back to L_entry_large.

Thanks, done

> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 1058:
> 
>> 1056:   };
>> 1057: 
>> 1058:   if (MaxVectorSize == 64) {
> 
> This should be an assert here instead of if check as this method shouldn't be called if MaxVectorSize is < 64.

Thanks, done.

> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 1175:
> 
>> 1173: void StubGenerator::copy256_avx3(Register dst, Register src, Register index, XMMRegister xmm1,
>> 1174:                                 XMMRegister xmm2, XMMRegister xmm3, XMMRegister xmm4,
>> 1175:                                 bool conjoint, int shift, int offset) {
> 
> The conjoint parameter is not used so could be removed from this function.

Thanks, done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16575#discussion_r1401179711
PR Review Comment: https://git.openjdk.org/jdk/pull/16575#discussion_r1401178991
PR Review Comment: https://git.openjdk.org/jdk/pull/16575#discussion_r1401180641
PR Review Comment: https://git.openjdk.org/jdk/pull/16575#discussion_r1401180258
PR Review Comment: https://git.openjdk.org/jdk/pull/16575#discussion_r1401177983


More information about the hotspot-compiler-dev mailing list