RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4]
Vladimir Kozlov
kvn at openjdk.org
Wed Apr 3 21:37:10 UTC 2024
On Thu, 28 Mar 2024 00:45:33 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:
>> The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used.
>>
>> The performance data using the ComputePI.java benchmark (part of this PR) is as follows:
>>
>>
>> Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup
>> -- | -- | -- | --
>> ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0
>> ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9
>> ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5
>> ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8
>> ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8
>> ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4
>>
>>
>>
>> Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup
>> -- | -- | -- | --
>> ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0
>> ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0
>> ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0
>> ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0
>> ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0
>> ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
>
> fix L2F cvtsi2ssq
I have one question for changes in assembler code.
I see you avoided `xor` for instruction with memory by executing them only without AVX.
I will run our performance testing to see if this change affects performance. Eric did run it but I don't know which version.
And I will run regular testing too.
src/hotspot/cpu/x86/assembler_x86.cpp line 2034:
> 2032: InstructionAttr attributes(AVX_128bit, /* rex_w */ VM_Version::supports_evex(), /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);
> 2033: attributes.set_rex_vex_w_reverted();
> 2034: int encode = simd_prefix_and_encode(dst, src, src, VEX_SIMD_F2, VEX_OPCODE_0F, &attributes);
Can you explain this change?
-------------
PR Review: https://git.openjdk.org/jdk/pull/18503#pullrequestreview-1978069711
PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2035636844
PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1550510499
More information about the hotspot-compiler-dev
mailing list