RFR: 8327147: Improve performance of Math ceil, floor, and rint for x86 [v2]

Jatin Bhateja jbhateja at openjdk.org
Fri Mar 8 01:14:58 UTC 2024


On Tue, 5 Mar 2024 22:37:49 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   unify the implementation
>
> So if we can still generate the non-AVX encoding of
> 
> `roundsd dst, src, mode`
> 
> isn't there still a false dependency problem with `dst`?

> @dean-long You bring up a very good point. The SSE instruction (roundsd dst, src, mode) also has a false dependency problem. This can be demonstrated by adding the following benchmark to MathBench.java:
> 
> ```
> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java
> index c7dde019154..feb472bba3d 100644
> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
> @@ -141,6 +141,11 @@ public double  ceilDouble() {
>          return  Math.ceil(double4Dot1);
>      }
> 
> +    @Benchmark
> +    public double  useAfterCeilDouble() {
> +        return  Math.ceil(double4Dot1) + Math.floor(double4Dot1);
> +    }
> +
>      @Benchmark
>      public double  copySignDouble() {
>          return  Math.copySign(double81, doubleNegative12);
> ```
> 
> The fix would be to do a pxor on dst before the SSE roundsd instruction, something like below:
> 
> ```
> diff --git a/src/hotspot/cpu/x86/x86.ad b/src/hotspot/cpu/x86/x86.ad
> index cf4aef83df2..eb6701f82a7 100644
> --- a/src/hotspot/cpu/x86/x86.ad
> +++ b/src/hotspot/cpu/x86/x86.ad
> @@ -3874,6 +3874,9 @@ instruct roundD_reg(legRegD dst, legRegD src, immU8 rmode) %{
>    ins_cost(150);
>    ins_encode %{
>      assert(UseSSE >= 4, "required");
> +    if ((UseAVX == 0) && ($dst$$XMMRegister != $src$$XMMRegister)) {
> +      __ pxor($dst$$XMMRegister, $dst$$XMMRegister);
> +    }
>      __ roundsd($dst$$XMMRegister, $src$$XMMRegister, $rmode$$constant);
>    %}
>    ins_pipe(pipe_slow);
> ```

FTR following link for more details on above issue
https://github.com/openjdk/jdk/pull/16701#issuecomment-1815645570

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18089#issuecomment-1984873081


More information about the hotspot-compiler-dev mailing list