RFR: 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite [v8]

Thu May 19 04:31:57 UTC 2022

On Thu, 19 May 2022 02:19:09 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:

> Please remove `isNaN` intrinsics in favour of #8525 .
> 
> Also, you should not use `andl(dst, 0xff)` to zero out the upper bits of `dst` since it is a 32-bit read following a 8-bit write, constitute a partial register stall
> 
> Refer to section 3.5.2.4, Partial register stalls from Intel® 64 and IA-32 Architectures Optimization Reference Manual:
> 
> > A partial register stall happens when an instruction refers to a register, portions of which were previously modified by other instructions.
> 
> There are 2 options worth considering here
> 
> * Zeroing the register before `setb` instruction, referring to the same section
>   > For optimal performance, use of zero idioms, before the use of the register, eliminates the need for partial register
>   > merge micro-ops
>   
>   
>   This is more preferable since it does not contribute an execution uop in the backend (but still consumes a slot in the
>   decoder and uop cache)
> * Zero extending the register after the `setb` instruction. This is less optimal since it has an extra latency of zero extension and adding a real executed uop in the backend.
> 
> Thanks.

#8525 seems to be eliminating the flags register fixup for `IsNaN()`. These intrinsics can show speedup over `ucomiss` instructions. Also, having the intrinsic can be used for future vectorization. So, we can keep the `IsNaN()` intrinsic along with your improvement. Both are orthogonal, not mutually exclusive. 

Actually, `andl(dst, 0xff)` is giving speedup over zeroing out the register before `setb`. Also, would a 32bit logical `and` of all bits cause the problem you mentioned?

-------------

PR: https://git.openjdk.java.net/jdk/pull/8459