RFR: 8264054: Bad XMM performance on java.lang.MathBench.sqrtDouble

Sandhya Viswanathan sviswanathan at openjdk.java.net
Tue Mar 30 20:33:22 UTC 2021


On Tue, 30 Mar 2021 18:22:02 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> For the j.l.Math JMH at https://github.com/openjdk/jmh-jdk-microbenchmarks/blob/master/micros-jdk11/src/main/java/org/openjdk/bench/java/lang/MathBench.java, the performance for sqrt benchmark could be improved.  Thanks a lot to Eric Caspole for finding this issue.
>> 
>> Benchmark:
>>     @Benchmark
>>     public double  sqrtDouble() {
>>         return  Math.sqrt(double4Dot1);
>>     }
>> 
>> Current code generated (linux format) by c2 JIT is:
>>      vsqrtsd 0x50(%r10),%xmm0,%xmm0
>> 
>> The vsqrtsd instruction operation is specified as below: 
>>      VSQRTSD (VEX.128 encoded version)
>>      DEST[63:0] := SQRT(SRC2[63:0])
>>      DEST[127:64] := SRC1[127:64]
>>      DEST[MAXVL-1:128] := 0
>> 
>> The upper 127:64 bits are set from previous contents of xmm0. As the destination xmm0 register was not initialized prior to use by c2 JIT, this causes stall and lower performance.
>> 
>> By adding xmm0 initialization prior to use, the performance of the above benchmark improves significantly.
>> 
>> Code generated after patch:
>>      vxorpd %xmm0,%xmm0,%xmm0
>>      vsqrtsd 0x50(%r10),%xmm0,%xmm0
>> 
>> Performance before patch:
>> Benchmark                    Mode  Cnt       Score      Error   Units
>> MathBench.sqrtDouble        thrpt    8  193612.396 ±  95.807  ops/ms
>> 
>> Performance after patch:
>> MathBench.sqrtDouble        thrpt    8  276388.024 ± 846.372  ops/ms
>> 
>> Best Regards,
>> Sandhya
>
> src/hotspot/cpu/x86/x86.ad line 3252:
> 
>> 3250:   predicate(UseSSE>=1);
>> 3251:   match(Set dst (SqrtF (LoadF src)));
>> 3252:   effect(TEMP dst);
> 
> Why do you declare `dst` as `TEMP` (here and in other places)?

Good point. I was being extra cautious that the register in address should not be overwritten by the xorps. But of course the address cannot have xmm register here so TEMP dst is not needed.  I will update the patch accordingly.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3256


More information about the hotspot-compiler-dev mailing list