RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy
Jatin Bhateja
jbhateja at openjdk.org
Thu Nov 16 05:45:30 UTC 2023
On Tue, 14 Nov 2023 07:59:22 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Below is baseline data collected using a modified version of the java.lang.foreign.xor micro benchmark referenced by @mcimadamore in the bug report. I collected data on an Ubuntu 22.04 laptop with a Tigerlake i7-1185G7, which does support AVX512.
>>
>> Baseline data
>> Benchmark (arrayKind) (sizeKind) Mode Cnt Score Error Units
>> --------------------------------------------------------------------------------------
>> XorTest.copy ELEMENTS SMALL avgt 30 584737355.767 ± 60414308.540 ns/op
>> XorTest.copy ELEMENTS MEDIUM avgt 30 272248995.683 ± 2924954.498 ns/op
>> XorTest.copy ELEMENTS LARGE avgt 30 1019200210.900 ± 28334453.652 ns/op
>> XorTest.copy REGION SMALL avgt 30 7399944.164 ± 216821.819 ns/op
>> XorTest.copy REGION MEDIUM avgt 30 20591454.558 ± 147398.572 ns/op
>> XorTest.copy REGION LARGE avgt 30 21649266.051 ± 179263.875 ns/op
>> XorTest.copy CRITICAL SMALL avgt 30 51079.357 ± 542.482 ns/op
>> XorTest.copy CRITICAL MEDIUM avgt 30 2496.961 ± 11.375 ns/op
>> XorTest.copy CRITICAL LARGE avgt 30 515.454 ± 5.831 ns/op
>> XorTest.copy FOREIGN SMALL avgt 30 7558432.075 ± 79489.276 ns/op
>> XorTest.copy FOREIGN MEDIUM avgt 30 19730666.341 ± 500505.099 ns/op
>> XorTest.copy FOREIGN LARGE avgt 30 34616758.085 ± 340300.726 ns/op
>> XorTest.xor ELEMENTS SMALL avgt 30 219832692.489 ± 2329417.319 ns/op
>> XorTest.xor ELEMENTS MEDIUM avgt 30 505138197.167 ± 3818334.424 ns/op
>> XorTest.xor ELEMENTS LARGE avgt 30 1189608474.667 ± 5877981.900 ns/op
>> XorTest.xor REGION SMALL avgt 30 64093872.804 ± 599704.491 ns/op
>> XorTest.xor REGION MEDIUM avgt 30 81544576.454 ± 1406342.118 ns/op
>> XorTest.xor REGION LARGE avgt 30 90091424.883 ± 775577.613 ns/op
>> XorTest.xor CRITICAL SMALL avgt 30 57231375.744 ± 438223.342 ns/op
>> XorTest.xor CRITICAL MEDIUM avgt 30 58583884.930 ± 375355.215 ns/op
>> XorTest.xor CRITICAL LARGE avgt 30 60644832.949 ± 588120.738 ns/op
>> XorTest.xor FOREIGN SMALL avgt 30 73868679.405 ± 819965.524 ns/op
>> XorTest.xor FOREIGN MEDIUM avgt 30 88156275.944 ± 1051257.152 ns/op
>> Xo...
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 585:
>
>> 583: __ shlq(temp2, shift);
>> 584: __ cmpq(temp2, large_threshold);
>> 585: __ jcc(Assembler::greaterEqual, L_copy_large);
>
> Hi @steveatgh , Can you please share the performance number of other Array copy JMH micros in following directoy https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/lang
I will still request you to run BM in above path, we may see performance dips for sizes after special cases due to additional comparisons.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16575#discussion_r1395174203
More information about the hotspot-compiler-dev
mailing list