[aarch64-port-dev ] [PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics, approval request

Mon Mar 4 10:06:16 UTC 2019

Hi Andrew Dinn,

Thanks for your work on testing my pending patch. But I have to say the 3rd implementation (vecplus) is buggy. The main reason I didn't use FMAXV/FMINV for 2 doubles is the architecture doesn't support that.

> instruct reduce_max4F(vRegF dst, vRegF src1, vecX src2) %{
>   . . .
>   ins_encode %{
>     __ fmaxv(as_FloatRegister($dst$$reg), __ T4S,
> as_FloatRegister($src2$$reg));
>     __ fmaxs(as_FloatRegister($dst$$reg), as_FloatRegister($dst$$reg),
> as_FloatRegister($src1$$reg));
>   %}
>   . . .

Above matching rule for the float reduction node is correct.

> instruct reduce_max2D(vRegD dst, vRegD src1, vecX src2, vecX tmp) %{
>   . . .
>   ins_encode %{
>     __ fmaxv(as_FloatRegister($dst$$reg), __ T2D,
> as_FloatRegister($src2$$reg));
>     __ fmaxd(as_FloatRegister($dst$$reg), as_FloatRegister($dst$$reg),
> as_FloatRegister($src1$$reg));
>   %}
>   . . .

But this one for the double reduction node is *not* correct.

From page 1502 of the Arm Architecture Reference Manual, we could see that the arrangement specifier for the floating-point min/max reduction instructions can only be 4S. It cannot be anything else. So below is the code I added in assembler_aarch64.hpp in my patch.

+#define INSN(NAME, opc) \
+  void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn) {                  \
+    starti;                                                                            \
+    assert(T == T4S, "arrangement must be T4S");                                       \
+    f(0b01101110, 31, 24), f(opc, 23), f(0b0110000111110, 22, 10);                     \
+    rf(Vn, 5), rf(Vd, 0);                                                              \
+  }
+
+  INSN(fmaxv, 0);
+  INSN(fminv, 1);
+
+#undef INSN
+

I hard-coded the arrangement bits in the encoding and added an assert (T == T4S) before that.

So if you use "__ fmaxv(as_FloatRegister($dst$$reg), __ T2D, as_FloatRegister($src2$$reg));" in the ad file. The code generated will still be something like "fmaxv     s21, v16.4s" (with 4S arrangement). And if you run the test with fastdebug build JDK, the assertion failure will be hit.

Maybe I shouldn't hard-code the arrangement bits in the encoding as it may hide problems. I will modify my previous webrev and post it in a new thread if you think so.

--
Thanks,
Pengfei