RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy

Jatin Bhateja jbhateja at openjdk.org
Tue Nov 14 08:08:31 UTC 2023


On Wed, 8 Nov 2023 23:23:48 GMT, Steve Dohrmann <duke at openjdk.org> wrote:

> Below is baseline data collected using a modified version of the java.lang.foreign.xor micro benchmark referenced by @mcimadamore  in the bug report.  I collected data on an Ubuntu 22.04 laptop with a Tigerlake i7-1185G7, which does support AVX512. 
> 
> Baseline data
> Benchmark     (arrayKind)  (sizeKind)  Mode  Cnt           Score          Error  Units
> --------------------------------------------------------------------------------------
> XorTest.copy     ELEMENTS       SMALL  avgt   30   584737355.767 ± 60414308.540  ns/op
> XorTest.copy     ELEMENTS      MEDIUM  avgt   30   272248995.683 ±  2924954.498  ns/op
> XorTest.copy     ELEMENTS       LARGE  avgt   30  1019200210.900 ± 28334453.652  ns/op
> XorTest.copy       REGION       SMALL  avgt   30     7399944.164 ±   216821.819  ns/op
> XorTest.copy       REGION      MEDIUM  avgt   30    20591454.558 ±   147398.572  ns/op
> XorTest.copy       REGION       LARGE  avgt   30    21649266.051 ±   179263.875  ns/op
> XorTest.copy     CRITICAL       SMALL  avgt   30       51079.357 ±      542.482  ns/op
> XorTest.copy     CRITICAL      MEDIUM  avgt   30        2496.961 ±       11.375  ns/op
> XorTest.copy     CRITICAL       LARGE  avgt   30         515.454 ±        5.831  ns/op
> XorTest.copy      FOREIGN       SMALL  avgt   30     7558432.075 ±    79489.276  ns/op
> XorTest.copy      FOREIGN      MEDIUM  avgt   30    19730666.341 ±   500505.099  ns/op
> XorTest.copy      FOREIGN       LARGE  avgt   30    34616758.085 ±   340300.726  ns/op
> XorTest.xor      ELEMENTS       SMALL  avgt   30   219832692.489 ±  2329417.319  ns/op
> XorTest.xor      ELEMENTS      MEDIUM  avgt   30   505138197.167 ±  3818334.424  ns/op
> XorTest.xor      ELEMENTS       LARGE  avgt   30  1189608474.667 ±  5877981.900  ns/op
> XorTest.xor        REGION       SMALL  avgt   30    64093872.804 ±   599704.491  ns/op
> XorTest.xor        REGION      MEDIUM  avgt   30    81544576.454 ±  1406342.118  ns/op
> XorTest.xor        REGION       LARGE  avgt   30    90091424.883 ±   775577.613  ns/op
> XorTest.xor      CRITICAL       SMALL  avgt   30    57231375.744 ±   438223.342  ns/op
> XorTest.xor      CRITICAL      MEDIUM  avgt   30    58583884.930 ±   375355.215  ns/op
> XorTest.xor      CRITICAL       LARGE  avgt   30    60644832.949 ±   588120.738  ns/op
> XorTest.xor       FOREIGN       SMALL  avgt   30    73868679.405 ±   819965.524  ns/op
> XorTest.xor       FOREIGN      MEDIUM  avgt   30    88156275.944 ±  1051257.152  ns/op
> XorTest.xor       FOREIGN       LARGE  avgt   30   123115513...

src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 585:

> 583:     __ shlq(temp2, shift);
> 584:     __ cmpq(temp2, large_threshold);
> 585:     __ jcc(Assembler::greaterEqual, L_copy_large);

Hi @steveatgh , Can you please share the performance number of other Array copy JMH micros in following directoy https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/lang

src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 585:

> 583:     __ shlq(temp2, shift);
> 584:     __ cmpq(temp2, large_threshold);
> 585:     __ jcc(Assembler::greaterEqual, L_copy_large);

I suspect additional checks for 2.5MB array size may hit the performance of other general sizes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16575#discussion_r1392137605
PR Review Comment: https://git.openjdk.org/jdk/pull/16575#discussion_r1392138600


More information about the hotspot-compiler-dev mailing list