RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6]
David Holmes
david.holmes at oracle.com
Thu Dec 2 02:51:46 UTC 2021
On 2/12/2021 12:32 pm, Jie Fu wrote:
> On Wed, 1 Dec 2021 23:19:47 GMT, Jie Fu <jiefu at openjdk.org> wrote:
>
>>> Yes, the patch doesn't change behavior on AVX2 and older AVX512 systems.
>>
>> Thanks for your clarification. But it still remains unknown why the 64-byte instructions shouldn't be used on CPUs which don't support `serialize`.
>>
>> I will test the 64-byte instructions on older AVX512 systems today and feedback here.
>
>
> Here is the performance data on our older AVX512 platform which doesn't support `serialize`.
>
> Even without `serialize` , the performance has been improved with 64-byte instructions.
> E.g., for `ArrayCopy.arrayCopyObjectNonConst`, it has been improved by ~15%.
>
> So it seems unfair only enable 64-byte instructions for the latest Intel AVX512 platforms.
>
> Still, I would like to know why we don't use 64-byte instructions on platforms without `serialize` support.
Because, as previously stated, there is no actual way to identify those
CPUs. But we know that if they support serialize then they also support
the faster 64-bit ops. But that doesn't means that if they don't support
serialize that they don't support the faster 64-bit ops. So all that is
available for choosing whether to use them or not is whether serialize
is supported.
David
> Thanks.
>
> ---------------------------------------------------
>
> Results with 32-byte instructions.
>
> ==> perf32-1.log <==
> Benchmark Mode Cnt Score Error Units
> ArrayCopy.arrayCopyObject avgt 5 24.070 ± 0.013 ns/op
> ArrayCopy.arrayCopyObjectNonConst avgt 5 27.517 ± 0.023 ns/op
> ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 21.127 ± 0.008 ns/op
> ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 21.934 ± 0.009 ns/op
>
> ==> perf32-2.log <==
> Benchmark Mode Cnt Score Error Units
> ArrayCopy.arrayCopyObject avgt 5 24.511 ± 0.027 ns/op
> ArrayCopy.arrayCopyObjectNonConst avgt 5 27.240 ± 0.034 ns/op
> ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 21.065 ± 0.013 ns/op
> ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 21.956 ± 0.161 ns/op
>
> ==> perf32-3.log <==
> Benchmark Mode Cnt Score Error Units
> ArrayCopy.arrayCopyObject avgt 5 25.357 ± 0.006 ns/op
> ArrayCopy.arrayCopyObjectNonConst avgt 5 27.513 ± 1.468 ns/op
> ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 20.984 ± 0.024 ns/op
> ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 20.945 ± 1.346 ns/op
>
>
>
> Results with 64-byte instructions.
>
> ==> perf64-1.log <==
> Benchmark Mode Cnt Score Error Units
> ArrayCopy.arrayCopyObject avgt 5 23.425 ± 0.003 ns/op
> ArrayCopy.arrayCopyObjectNonConst avgt 5 23.530 ± 0.002 ns/op
> ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 20.174 ± 0.074 ns/op
> ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 19.942 ± 0.134 ns/op
>
> ==> perf64-2.log <==
> Benchmark Mode Cnt Score Error Units
> ArrayCopy.arrayCopyObject avgt 5 22.429 ± 0.012 ns/op
> ArrayCopy.arrayCopyObjectNonConst avgt 5 25.189 ± 0.031 ns/op
> ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 20.093 ± 0.004 ns/op
> ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 20.400 ± 1.213 ns/op
>
> ==> perf64-3.log <==
> Benchmark Mode Cnt Score Error Units
> ArrayCopy.arrayCopyObject avgt 5 23.472 ± 0.002 ns/op
> ArrayCopy.arrayCopyObjectNonConst avgt 5 23.534 ± 0.031 ns/op
> ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 20.232 ± 0.150 ns/op
> ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 21.921 ± 0.008 ns/op
>
> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/6512
>
More information about the hotspot-compiler-dev
mailing list