RFR: 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite [v8]
Srinivas Vamsi Parasa
duke at openjdk.java.net
Thu May 19 20:28:48 UTC 2022
On Thu, 19 May 2022 05:22:34 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:
> It sounds strange, please show the asm of your patch with respects to the benchmark. Also, please try cmoving with other arbitrary values such as 19 and 7 instead of `false` and `true`. The latter may be recognised as simple boolean not operation, remove the real comparison part, which defeats the purpose of Vladimir's suggestion.
>
> Regarding vectorisation, `isNaN` is a simple comparison and can be easily auto-vectorised without help from intrinsics.
>
> My speculation:
>
> A native comparison such as `x != x` can be parsed directly by the compiler. As a result, the graph of the expression `if (x != x)` is simply
>
> ```
> CmpF
> |
> Bool
> |
> If
> ```
>
> Your intrinsics, on the other hand, do not return the results on the flags, which leads to an extra comparison when using in conditions, `if(isNaN(x))` becomes
>
> ```
> IsNaN 0
> \ /
> CmpI
> |
> Bool
> |
> If
> ```
>
> In your benchmark, however, using this comparison to cmoving between 0 and 1 (`false` and `true`), the compiler recognised the pattern `x != 0 ? 0 : 1` with `x` having the type of `TypeInt::BOOL`. As a result, it reduces the graph into
>
> ```
> IsNaN 1
> \ /
> XorI
> ```
>
> Personally, I'm not into this implementation of intrinsics. FYI, gcc and clang both use sequences similar to `x != x` for `std::isnan`, `Math.abs(x) <= MAX_VALUE` for `std::isfinite` and `Math.abs(x) > MAX_VALUE` for `std::isinf`. The first one reduces to a single instruction `ucomiss x, x` so there is no reason to optimise further. The others are compiled down to 2 instructions each `vandpd t, x, [SIGN_ELIMINATE]; ucomiss t, [MAX_VALUE]`, so to optimise these further requires careful assessments.
>
> If you feel comfortable I would suggest you build the graph for these intrinsics as
>
> ```
> X
> |
> Bool 1 0
> \ | /
> CMove
> ```
>
> Then we can add ideal rules to `BoolNode` to recognise the patterns
>
> ```
> X
> |
> Bool 1 0
> \ | /
> CMove 0
> \ /
> CmpI
> |
> Bool
> ```
>
> And reduce them to
>
> ```
> X
> |
> Bool
> ```
>
> With this, we can have the `Double::isInfinite` intrinsics compiled down to `vfpclass k, x; ktest k`, which is much more preferable. For non-AVX512DQ though I would prefer implementing them in Java similar to described above. Both abs and comparison nodes are not hard to be vectorised so it would not be a problem.
>
> Thanks a lot.
As per the performance data,
- for `isInfinite()`, the intrinsic is giving 60% speedup for `Float` (and 40% for `Double`) compared to the Java implementation.
- for` isFinite()`, the intrinsic is giving 20% speedup for `Float` (and 6% slowdown for `Double`) instead of the Java implementation (`return Math.abs(f) <= Float.MAX_VALUE`).
- for `isNan()`, the flags fixup is skewing the results in favor of the intrinsic (5x for` Float `and 4x for `Double`) but you're saying that your patch will make this intrinsic approach unnecessary.
I agree with your logic, but we also need to back it up with performance data. Could you please run my benchmark for `isNaN() `with your patch and provide the performance data? That way we can quantify the performance of your proposed optimization of `isNaN()`.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8459
More information about the hotspot-compiler-dev
mailing list