RFR: 8302850: Consider implementing a C1 clone intrinsic that uses ArrayCopyNode for primitive arrays [v4]

Galder Zamarreño galder at openjdk.org
Mon Feb 12 13:21:05 UTC 2024


On Mon, 12 Feb 2024 13:17:11 GMT, Galder Zamarreño <galder at openjdk.org> wrote:

>> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures.
>> 
>> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64:
>> 
>> 
>> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1"
>> Benchmark                 (size)  Mode  Cnt    Score    Error  Units
>> ArrayClone.byteArraycopy       0  avgt   15    3.476 ?  0.018  ns/op
>> ArrayClone.byteArraycopy      10  avgt   15    3.740 ?  0.017  ns/op
>> ArrayClone.byteArraycopy     100  avgt   15    7.124 ?  0.010  ns/op
>> ArrayClone.byteArraycopy    1000  avgt   15   39.301 ?  0.106  ns/op
>> ArrayClone.byteClone           0  avgt   15    3.478 ?  0.008  ns/op
>> ArrayClone.byteClone          10  avgt   15    3.562 ?  0.007  ns/op
>> ArrayClone.byteClone         100  avgt   15    5.888 ?  0.206  ns/op
>> ArrayClone.byteClone        1000  avgt   15   25.762 ?  0.203  ns/op
>> ArrayClone.intArraycopy        0  avgt   15    3.199 ?  0.016  ns/op
>> ArrayClone.intArraycopy       10  avgt   15    4.521 ?  0.008  ns/op
>> ArrayClone.intArraycopy      100  avgt   15   17.429 ?  0.039  ns/op
>> ArrayClone.intArraycopy     1000  avgt   15  178.432 ?  0.777  ns/op
>> ArrayClone.intClone            0  avgt   15    3.406 ?  0.016  ns/op
>> ArrayClone.intClone           10  avgt   15    4.272 ?  0.006  ns/op
>> ArrayClone.intClone          100  avgt   15   13.110 ?  0.122  ns/op
>> ArrayClone.intClone         1000  avgt   15  113.196 ? 13.400  ns/op
>> 
>> 
>> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this.
>> 
>> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g.
>> 
>> 
>> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1"
>> ...
>>    TEST                                              TOTAL  PASS  FAIL ERROR
>>    jtreg:test/hotspot/jtreg:hotspot_compiler          1234  1234     0     0
>> 
>> 
>> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts?
>> 
>>...
>
> Galder Zamarreño has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - 8302850: C1 primitive array clone intrinsic in graph
>    
>    * Combine array length, new type array and arraycopy for clone in c1 graph.
>    * Add OmitCheckFlags to skip arraycopy checks.
>    * Instantiate ArrayCopyStub only if necessary.
>    * Avoid zeroing newly created arrays for clone.
>    * Add array null after c1 clone compilation test.
>    * Pass force reexecute to intrinsic via value stack.
>    This is needed to be able to deoptimize correctly this intrinsic.
>    * When new type array or array copy are used for the clone intrinsic,
>    their state needs to be based on the state before for deoptimization
>    to work as expected.
>  - Revert "8302850: Primitive array copy C1 intrinsic for aarch64 and x86"
>    
>    This reverts commit fe5d916724614391a685bbef58ea939c84197d07.

I've just pushed a new solution implemented, as much as possible, at the graph level. I've reverted the previous solution to keep history linear.

I've run the hotspot compiler tests successfully on darwin/aarch64 and linux/x86:


==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR
   jtreg:test/hotspot/jtreg:hotspot_compiler          1242  1242     0     0
==============================
TEST SUCCESS

Finished building target 'test' in configuration 'release-darwin-arm64'



==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR
   jtreg:test/hotspot/jtreg:hotspot_compiler          1235  1235     0     0
==============================
TEST SUCCESS

Finished building target 'test' in configuration 'release-linux-x86_64'


The array clone microbenchmark performance numbers show the expected improvements in both setups:


Benchmark                 (size)  Mode  Cnt    Score   Error  Units
ArrayClone.byteArraycopy       0  avgt   15    3.467 ? 0.006  ns/op
ArrayClone.byteArraycopy      10  avgt   15    3.673 ? 0.047  ns/op
ArrayClone.byteArraycopy     100  avgt   15    7.255 ? 0.025  ns/op
ArrayClone.byteArraycopy    1000  avgt   15   39.119 ? 0.595  ns/op
ArrayClone.byteClone           0  avgt   15    2.996 ? 0.008  ns/op
ArrayClone.byteClone          10  avgt   15    3.117 ? 0.020  ns/op
ArrayClone.byteClone         100  avgt   15    5.487 ? 0.063  ns/op
ArrayClone.byteClone        1000  avgt   15   25.232 ? 0.445  ns/op
ArrayClone.intArraycopy        0  avgt   15    3.198 ? 0.029  ns/op
ArrayClone.intArraycopy       10  avgt   15    4.428 ? 0.065  ns/op
ArrayClone.intArraycopy      100  avgt   15   17.015 ? 0.150  ns/op
ArrayClone.intArraycopy     1000  avgt   15  176.464 ? 1.820  ns/op
ArrayClone.intClone            0  avgt   15    2.891 ? 0.019  ns/op
ArrayClone.intClone           10  avgt   15    3.794 ? 0.008  ns/op
ArrayClone.intClone          100  avgt   15   12.289 ? 0.084  ns/op
ArrayClone.intClone         1000  avgt   15  112.032 ? 3.787  ns/op
Finished running test 'micro:java.lang.ArrayClone'
Test report is stored in build/release-darwin-arm64/test-results/micro_java_lang_ArrayClone



Benchmark                 (size)  Mode  Cnt    Score    Error  Units
ArrayClone.byteArraycopy       0  avgt   15    7.209 ?  0.168  ns/op
ArrayClone.byteArraycopy      10  avgt   15    8.433 ?  0.154  ns/op
ArrayClone.byteArraycopy     100  avgt   15   13.175 ?  1.063  ns/op
ArrayClone.byteArraycopy    1000  avgt   15  143.002 ?  2.946  ns/op
ArrayClone.byteClone           0  avgt   15    6.909 ?  0.120  ns/op
ArrayClone.byteClone          10  avgt   15    8.129 ?  0.744  ns/op
ArrayClone.byteClone         100  avgt   15    9.196 ?  0.201  ns/op
ArrayClone.byteClone        1000  avgt   15   60.831 ?  1.147  ns/op
ArrayClone.intArraycopy        0  avgt   15    6.674 ?  0.137  ns/op
ArrayClone.intArraycopy       10  avgt   15    8.815 ?  0.176  ns/op
ArrayClone.intArraycopy      100  avgt   15   42.191 ?  1.376  ns/op
ArrayClone.intArraycopy     1000  avgt   15  553.157 ? 51.817  ns/op
ArrayClone.intClone            0  avgt   15    6.376 ?  0.130  ns/op
ArrayClone.intClone           10  avgt   15    7.690 ?  0.652  ns/op
ArrayClone.intClone          100  avgt   15   25.115 ?  0.483  ns/op
ArrayClone.intClone         1000  avgt   15  245.194 ? 17.418  ns/op
Finished running test 'micro:java.lang.ArrayClone'
Test report is stored in build/release-linux-x86_64/test-results/micro_java_lang_ArrayClone

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-1938665855


More information about the hotspot-compiler-dev mailing list