RFR: 8264054: Bad XMM performance on java.lang.MathBench.sqrtDouble
Sandhya Viswanathan
sviswanathan at openjdk.java.net
Tue Mar 30 20:33:22 UTC 2021
On Tue, 30 Mar 2021 17:53:18 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> For the j.l.Math JMH at https://github.com/openjdk/jmh-jdk-microbenchmarks/blob/master/micros-jdk11/src/main/java/org/openjdk/bench/java/lang/MathBench.java, the performance for sqrt benchmark could be improved. Thanks a lot to Eric Caspole for finding this issue.
>>
>> Benchmark:
>> @Benchmark
>> public double sqrtDouble() {
>> return Math.sqrt(double4Dot1);
>> }
>>
>> Current code generated (linux format) by c2 JIT is:
>> vsqrtsd 0x50(%r10),%xmm0,%xmm0
>>
>> The vsqrtsd instruction operation is specified as below:
>> VSQRTSD (VEX.128 encoded version)
>> DEST[63:0] := SQRT(SRC2[63:0])
>> DEST[127:64] := SRC1[127:64]
>> DEST[MAXVL-1:128] := 0
>>
>> The upper 127:64 bits are set from previous contents of xmm0. As the destination xmm0 register was not initialized prior to use by c2 JIT, this causes stall and lower performance.
>>
>> By adding xmm0 initialization prior to use, the performance of the above benchmark improves significantly.
>>
>> Code generated after patch:
>> vxorpd %xmm0,%xmm0,%xmm0
>> vsqrtsd 0x50(%r10),%xmm0,%xmm0
>>
>> Performance before patch:
>> Benchmark Mode Cnt Score Error Units
>> MathBench.sqrtDouble thrpt 8 193612.396 ± 95.807 ops/ms
>>
>> Performance after patch:
>> MathBench.sqrtDouble thrpt 8 276388.024 ± 846.372 ops/ms
>>
>> Best Regards,
>> Sandhya
>
> What is faster: `xor(xmm) + sqrt(xmm, mem)` or `mov(xmm, mem) + sqrt(xmm,xmm)` ?
>
> Do we have other instructions which may suffer stall too if `dst` register is not zeroed?
@vnkozlov Both xor(xmm) + sqrt(xmm, mem) or mov(xmm, mem) + sqrt(xmm,xmm) have same performance. I could look into simplifying the patch by just keeping the register version of these rules.
I looked through x86.ad for other usages of unary instructions but didn't find any other instances. We implement negate and abs using binary instructions.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3256
More information about the hotspot-compiler-dev
mailing list