RFR: 8264054: Bad XMM performance on java.lang.MathBench.sqrtDouble
Sandhya Viswanathan
sviswanathan at openjdk.java.net
Tue Mar 30 04:19:03 UTC 2021
For the j.l.Math JMH at https://github.com/openjdk/jmh-jdk-microbenchmarks/blob/master/micros-jdk11/src/main/java/org/openjdk/bench/java/lang/MathBench.java, the performance for sqrt benchmark could be improved. Thanks a lot to Eric Caspole for finding this issue.
Benchmark:
@Benchmark
public double sqrtDouble() {
return Math.sqrt(double4Dot1);
}
Current code generated (linux format) by c2 JIT is:
vsqrtsd 0x50(%r10),%xmm0,%xmm0
The vsqrtsd instruction operation is specified as below:
VSQRTSD (VEX.128 encoded version)
DEST[63:0] := SQRT(SRC2[63:0])
DEST[127:64] := SRC1[127:64]
DEST[MAXVL-1:128] := 0
The upper 127:64 bits are set from previous contents of xmm0. As the destination xmm0 register was not initialized prior to use by c2 JIT, this causes stall and lower performance.
By adding xmm0 initialization prior to use, the performance of the above benchmark improves significantly.
Code generated after patch:
vxorpd %xmm0,%xmm0,%xmm0
vsqrtsd 0x50(%r10),%xmm0,%xmm0
Performance before patch:
Benchmark Mode Cnt Score Error Units
MathBench.sqrtDouble thrpt 8 193612.396 ± 95.807 ops/ms
Performance after patch:
MathBench.sqrtDouble thrpt 8 276388.024 ± 846.372 ops/ms
Best Regards,
Sandhya
-------------
Commit messages:
- No xor needed when src and dest are same
- 8264054: Bad XMM performance on java.lang.MathBench.sqrtDouble
Changes: https://git.openjdk.java.net/jdk/pull/3256/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3256&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8264054
Stats: 16 lines in 1 file changed: 11 ins; 2 del; 3 mod
Patch: https://git.openjdk.java.net/jdk/pull/3256.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/3256/head:pull/3256
PR: https://git.openjdk.java.net/jdk/pull/3256
More information about the hotspot-compiler-dev
mailing list