RFR: 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite [v8]
Srinivas Vamsi Parasa
duke at openjdk.java.net
Thu May 19 04:31:57 UTC 2022
On Thu, 19 May 2022 02:19:09 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:
> Please remove `isNaN` intrinsics in favour of #8525 .
>
> Also, you should not use `andl(dst, 0xff)` to zero out the upper bits of `dst` since it is a 32-bit read following a 8-bit write, constitute a partial register stall
>
> Refer to section 3.5.2.4, Partial register stalls from Intel® 64 and IA-32 Architectures Optimization Reference Manual:
>
> > A partial register stall happens when an instruction refers to a register, portions of which were previously modified by other instructions.
>
> There are 2 options worth considering here
>
> * Zeroing the register before `setb` instruction, referring to the same section
> > For optimal performance, use of zero idioms, before the use of the register, eliminates the need for partial register
> > merge micro-ops
>
>
> This is more preferable since it does not contribute an execution uop in the backend (but still consumes a slot in the
> decoder and uop cache)
> * Zero extending the register after the `setb` instruction. This is less optimal since it has an extra latency of zero extension and adding a real executed uop in the backend.
>
> Thanks.
#8525 seems to be eliminating the flags register fixup for `IsNaN()`. These intrinsics can show speedup over `ucomiss` instructions. Also, having the intrinsic can be used for future vectorization. So, we can keep the `IsNaN()` intrinsic along with your improvement. Both are orthogonal, not mutually exclusive.
Actually, `andl(dst, 0xff)` is giving speedup over zeroing out the register before `setb`. Also, would a 32bit logical `and` of all bits cause the problem you mentioned?
-------------
PR: https://git.openjdk.java.net/jdk/pull/8459
More information about the hotspot-compiler-dev
mailing list