RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6]

David Holmes david.holmes at oracle.com
Thu Dec 2 02:51:46 UTC 2021


On 2/12/2021 12:32 pm, Jie Fu wrote:
> On Wed, 1 Dec 2021 23:19:47 GMT, Jie Fu <jiefu at openjdk.org> wrote:
> 
>>> Yes, the patch doesn't change behavior on AVX2 and older AVX512 systems.
>>
>> Thanks for your clarification. But it still remains unknown why the 64-byte instructions shouldn't be used on CPUs which don't support `serialize`.
>>
>> I will test the 64-byte instructions on older AVX512 systems today and feedback here.
> 
> 
> Here is the performance data on our older AVX512 platform which doesn't support `serialize`.
> 
> Even without `serialize` , the performance has been improved with 64-byte instructions.
> E.g., for `ArrayCopy.arrayCopyObjectNonConst`, it has been improved by ~15%.
> 
> So it seems unfair only enable 64-byte instructions for the latest Intel AVX512 platforms.
> 
> Still, I would like to know why we don't use 64-byte instructions on platforms without `serialize` support.

Because, as previously stated, there is no actual way to identify those 
CPUs. But we know that if they support serialize then they also support 
the faster 64-bit ops. But that doesn't means that if they don't support 
serialize that they don't support the faster 64-bit ops. So all that is 
available for choosing whether to use them or not is whether serialize 
is supported.

David

> Thanks.
> 
> ---------------------------------------------------
> 
> Results with 32-byte instructions.
> 
> ==> perf32-1.log <==
> Benchmark                                    Mode  Cnt   Score   Error  Units
> ArrayCopy.arrayCopyObject                    avgt    5  24.070 ± 0.013  ns/op
> ArrayCopy.arrayCopyObjectNonConst            avgt    5  27.517 ± 0.023  ns/op
> ArrayCopy.arrayCopyObjectSameArraysBackward  avgt    5  21.127 ± 0.008  ns/op
> ArrayCopy.arrayCopyObjectSameArraysForward   avgt    5  21.934 ± 0.009  ns/op
> 
> ==> perf32-2.log <==
> Benchmark                                    Mode  Cnt   Score   Error  Units
> ArrayCopy.arrayCopyObject                    avgt    5  24.511 ± 0.027  ns/op
> ArrayCopy.arrayCopyObjectNonConst            avgt    5  27.240 ± 0.034  ns/op
> ArrayCopy.arrayCopyObjectSameArraysBackward  avgt    5  21.065 ± 0.013  ns/op
> ArrayCopy.arrayCopyObjectSameArraysForward   avgt    5  21.956 ± 0.161  ns/op
> 
> ==> perf32-3.log <==
> Benchmark                                    Mode  Cnt   Score   Error  Units
> ArrayCopy.arrayCopyObject                    avgt    5  25.357 ± 0.006  ns/op
> ArrayCopy.arrayCopyObjectNonConst            avgt    5  27.513 ± 1.468  ns/op
> ArrayCopy.arrayCopyObjectSameArraysBackward  avgt    5  20.984 ± 0.024  ns/op
> ArrayCopy.arrayCopyObjectSameArraysForward   avgt    5  20.945 ± 1.346  ns/op
> 
> 
> 
> Results with 64-byte instructions.
> 
> ==> perf64-1.log <==
> Benchmark                                    Mode  Cnt   Score   Error  Units
> ArrayCopy.arrayCopyObject                    avgt    5  23.425 ± 0.003  ns/op
> ArrayCopy.arrayCopyObjectNonConst            avgt    5  23.530 ± 0.002  ns/op
> ArrayCopy.arrayCopyObjectSameArraysBackward  avgt    5  20.174 ± 0.074  ns/op
> ArrayCopy.arrayCopyObjectSameArraysForward   avgt    5  19.942 ± 0.134  ns/op
> 
> ==> perf64-2.log <==
> Benchmark                                    Mode  Cnt   Score   Error  Units
> ArrayCopy.arrayCopyObject                    avgt    5  22.429 ± 0.012  ns/op
> ArrayCopy.arrayCopyObjectNonConst            avgt    5  25.189 ± 0.031  ns/op
> ArrayCopy.arrayCopyObjectSameArraysBackward  avgt    5  20.093 ± 0.004  ns/op
> ArrayCopy.arrayCopyObjectSameArraysForward   avgt    5  20.400 ± 1.213  ns/op
> 
> ==> perf64-3.log <==
> Benchmark                                    Mode  Cnt   Score   Error  Units
> ArrayCopy.arrayCopyObject                    avgt    5  23.472 ± 0.002  ns/op
> ArrayCopy.arrayCopyObjectNonConst            avgt    5  23.534 ± 0.031  ns/op
> ArrayCopy.arrayCopyObjectSameArraysBackward  avgt    5  20.232 ± 0.150  ns/op
> ArrayCopy.arrayCopyObjectSameArraysForward   avgt    5  21.921 ± 0.008  ns/op
> 
> -------------
> 
> PR: https://git.openjdk.java.net/jdk/pull/6512
> 


More information about the hotspot-compiler-dev mailing list