RFR: 8302850: Consider implementing a C1 clone intrinsic that uses ArrayCopyNode for primitive arrays [v4]
Galder Zamarreño
galder at openjdk.org
Mon Feb 12 13:21:05 UTC 2024
On Mon, 12 Feb 2024 13:17:11 GMT, Galder Zamarreño <galder at openjdk.org> wrote:
>> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures.
>>
>> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64:
>>
>>
>> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1"
>> Benchmark (size) Mode Cnt Score Error Units
>> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op
>> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op
>> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op
>> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op
>> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op
>> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op
>> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op
>> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op
>> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op
>> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op
>> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op
>> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op
>> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op
>> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op
>> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op
>> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op
>>
>>
>> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this.
>>
>> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g.
>>
>>
>> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1"
>> ...
>> TEST TOTAL PASS FAIL ERROR
>> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0
>>
>>
>> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts?
>>
>>...
>
> Galder Zamarreño has updated the pull request incrementally with two additional commits since the last revision:
>
> - 8302850: C1 primitive array clone intrinsic in graph
>
> * Combine array length, new type array and arraycopy for clone in c1 graph.
> * Add OmitCheckFlags to skip arraycopy checks.
> * Instantiate ArrayCopyStub only if necessary.
> * Avoid zeroing newly created arrays for clone.
> * Add array null after c1 clone compilation test.
> * Pass force reexecute to intrinsic via value stack.
> This is needed to be able to deoptimize correctly this intrinsic.
> * When new type array or array copy are used for the clone intrinsic,
> their state needs to be based on the state before for deoptimization
> to work as expected.
> - Revert "8302850: Primitive array copy C1 intrinsic for aarch64 and x86"
>
> This reverts commit fe5d916724614391a685bbef58ea939c84197d07.
I've just pushed a new solution implemented, as much as possible, at the graph level. I've reverted the previous solution to keep history linear.
I've run the hotspot compiler tests successfully on darwin/aarch64 and linux/x86:
==============================
Test summary
==============================
TEST TOTAL PASS FAIL ERROR
jtreg:test/hotspot/jtreg:hotspot_compiler 1242 1242 0 0
==============================
TEST SUCCESS
Finished building target 'test' in configuration 'release-darwin-arm64'
==============================
Test summary
==============================
TEST TOTAL PASS FAIL ERROR
jtreg:test/hotspot/jtreg:hotspot_compiler 1235 1235 0 0
==============================
TEST SUCCESS
Finished building target 'test' in configuration 'release-linux-x86_64'
The array clone microbenchmark performance numbers show the expected improvements in both setups:
Benchmark (size) Mode Cnt Score Error Units
ArrayClone.byteArraycopy 0 avgt 15 3.467 ? 0.006 ns/op
ArrayClone.byteArraycopy 10 avgt 15 3.673 ? 0.047 ns/op
ArrayClone.byteArraycopy 100 avgt 15 7.255 ? 0.025 ns/op
ArrayClone.byteArraycopy 1000 avgt 15 39.119 ? 0.595 ns/op
ArrayClone.byteClone 0 avgt 15 2.996 ? 0.008 ns/op
ArrayClone.byteClone 10 avgt 15 3.117 ? 0.020 ns/op
ArrayClone.byteClone 100 avgt 15 5.487 ? 0.063 ns/op
ArrayClone.byteClone 1000 avgt 15 25.232 ? 0.445 ns/op
ArrayClone.intArraycopy 0 avgt 15 3.198 ? 0.029 ns/op
ArrayClone.intArraycopy 10 avgt 15 4.428 ? 0.065 ns/op
ArrayClone.intArraycopy 100 avgt 15 17.015 ? 0.150 ns/op
ArrayClone.intArraycopy 1000 avgt 15 176.464 ? 1.820 ns/op
ArrayClone.intClone 0 avgt 15 2.891 ? 0.019 ns/op
ArrayClone.intClone 10 avgt 15 3.794 ? 0.008 ns/op
ArrayClone.intClone 100 avgt 15 12.289 ? 0.084 ns/op
ArrayClone.intClone 1000 avgt 15 112.032 ? 3.787 ns/op
Finished running test 'micro:java.lang.ArrayClone'
Test report is stored in build/release-darwin-arm64/test-results/micro_java_lang_ArrayClone
Benchmark (size) Mode Cnt Score Error Units
ArrayClone.byteArraycopy 0 avgt 15 7.209 ? 0.168 ns/op
ArrayClone.byteArraycopy 10 avgt 15 8.433 ? 0.154 ns/op
ArrayClone.byteArraycopy 100 avgt 15 13.175 ? 1.063 ns/op
ArrayClone.byteArraycopy 1000 avgt 15 143.002 ? 2.946 ns/op
ArrayClone.byteClone 0 avgt 15 6.909 ? 0.120 ns/op
ArrayClone.byteClone 10 avgt 15 8.129 ? 0.744 ns/op
ArrayClone.byteClone 100 avgt 15 9.196 ? 0.201 ns/op
ArrayClone.byteClone 1000 avgt 15 60.831 ? 1.147 ns/op
ArrayClone.intArraycopy 0 avgt 15 6.674 ? 0.137 ns/op
ArrayClone.intArraycopy 10 avgt 15 8.815 ? 0.176 ns/op
ArrayClone.intArraycopy 100 avgt 15 42.191 ? 1.376 ns/op
ArrayClone.intArraycopy 1000 avgt 15 553.157 ? 51.817 ns/op
ArrayClone.intClone 0 avgt 15 6.376 ? 0.130 ns/op
ArrayClone.intClone 10 avgt 15 7.690 ? 0.652 ns/op
ArrayClone.intClone 100 avgt 15 25.115 ? 0.483 ns/op
ArrayClone.intClone 1000 avgt 15 245.194 ? 17.418 ns/op
Finished running test 'micro:java.lang.ArrayClone'
Test report is stored in build/release-linux-x86_64/test-results/micro_java_lang_ArrayClone
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-1938665855
More information about the hotspot-compiler-dev
mailing list