RFR: 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite [v8]
Srinivas Vamsi Parasa
duke at openjdk.java.net
Sat May 21 07:46:51 UTC 2022
On Fri, 29 Apr 2022 00:36:18 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits:
>>
>> - add comment for vfpclasss/d for isFinite()
>> - Merge branch 'master' of https://git.openjdk.java.net/jdk into float
>> - zero out the upper bits not written by setb
>> - use 0x1 to be simpler
>> - remove the redundant temp register
>> - Split the macros using predicate
>> - update jmh tests
>> - Merge branch 'master' into float
>> - 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite
>
> Impressive. Few comments.
>
> You are testing performance of storing `boolean` results into array but usually these Java methods used in conditions. Measuring that will be more real word case. For both case: with `avx512dq` On and OFF.
>
> And you need to post you perf results at least in RFE. Please, also show what instructions are currently generated vs your changes. I don't get how you made `isNaN()` faster - you generate more instructions is seems.
>
> Instead of 3 new Ideal nodes per type you can use one and store instrinsic id (or other enum) in its field which you can read in `.ad` file instructions. Instead I suggest to split those mach instructions based on `avx512dq` support to avoid unused registers killing.
>
> Why Double type support is limited to LP64? Why there is no `x86_32.ad` changes?
>
> You can reuse `tmp1` in `double_class_check()`.
Hi Vladimir (@vnkozlov)
For 32bit, in the case of double, we see performance improvement using `vfpclasssd` instruction but **without** `vfpclassd`, we see **40% decrease** in performance for `isFinite()` compared to the original Java code. Below, is the code which implements the intrinsic using SSE.
Is it Ok to skip support for **non** `vfpclassd` for 32bit?
void C2_MacroAssembler::double_class_check_sse(int opcode, XMMRegister src, Register dst, Register temp, Register temp1) {
int32_t POS_INF_HI = 0x7ff00000; // hi 32bits
int32_t KILL_SIGN_MASK_HI = 0x7fffffff; // hi 32 bits
pshuflw(src, src, 0x4e); //switch hi to lo
movdl(temp, src);
movl(temp1, KILL_SIGN_MASK_HI);
andl(temp, temp1);
movl(temp1, POS_INF_HI);
cmpl(temp, temp1);
switch (opcode) {
case Op_IsFiniteD:
setb(Assembler::below, dst);
break;
case Op_IsInfiniteD:
setb(Assembler::equal, dst);
break;
case Op_IsNaND:
setb(Assembler::above, dst);
break;
default:
assert(false, "%s", NodeClassNames[opcode]);
}
andl(dst, 0xff);
}
-------------
PR: https://git.openjdk.java.net/jdk/pull/8459
More information about the hotspot-compiler-dev
mailing list