RFR: 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite [v8]

Sat May 21 07:46:51 UTC 2022

On Fri, 29 Apr 2022 00:36:18 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits:
>> 
>>  - add comment for vfpclasss/d for isFinite()
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into float
>>  - zero out the upper bits not written by setb
>>  - use 0x1 to be simpler
>>  - remove the redundant temp register
>>  - Split the macros using predicate
>>  - update jmh tests
>>  - Merge branch 'master' into float
>>  - 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite
>
> Impressive. Few comments.
> 
> You are testing performance of storing `boolean` results into array but usually these Java methods used in conditions. Measuring that will be more real word case. For both case: with `avx512dq` On and OFF.
> 
> And you need to post you perf results at least in RFE. Please, also show what instructions are currently generated vs your changes. I don't get how you made `isNaN()` faster - you generate more instructions is seems.
> 
> Instead of 3 new Ideal nodes per type you can use one and store instrinsic id (or other enum) in its field which you can read in `.ad` file instructions. Instead I suggest to split those mach instructions based on `avx512dq` support to avoid unused registers killing.
> 
> Why Double type support is limited to LP64? Why there is no `x86_32.ad` changes?
> 
> You can reuse `tmp1` in `double_class_check()`.

Hi Vladimir (@vnkozlov)

For 32bit, in the case of double, we see performance improvement using `vfpclasssd` instruction but **without** `vfpclassd`, we see **40% decrease** in performance for `isFinite()` compared to the original Java code. Below, is the code which implements the intrinsic using SSE.

Is it Ok to skip support for **non** `vfpclassd` for 32bit?

void C2_MacroAssembler::double_class_check_sse(int opcode, XMMRegister src, Register dst, Register temp, Register temp1) {
  int32_t POS_INF_HI = 0x7ff00000; // hi 32bits
  int32_t KILL_SIGN_MASK_HI = 0x7fffffff; // hi 32 bits

  pshuflw(src, src, 0x4e); //switch hi to lo
  movdl(temp, src);
  movl(temp1, KILL_SIGN_MASK_HI);
  andl(temp, temp1);
  movl(temp1, POS_INF_HI);
  cmpl(temp, temp1);
  switch (opcode) {
    case Op_IsFiniteD:
      setb(Assembler::below, dst);
      break;
    case Op_IsInfiniteD:
      setb(Assembler::equal, dst);
      break;
    case Op_IsNaND:
      setb(Assembler::above, dst);
      break;
    default:
      assert(false, "%s", NodeClassNames[opcode]);
  }
  andl(dst, 0xff);
}

-------------

PR: https://git.openjdk.java.net/jdk/pull/8459