RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative floats

Tue Sep 2 03:04:46 UTC 2025

On Mon, 1 Sep 2025 14:35:40 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> The sve_cpy instruction is not correctly implemented for negative floating-point values. The issues include:
>> 
>> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))` check fails. For example, assume `d = -1.0`:
>> - `pack(-1.0)` returns an unsigned int with the 7th bit set, i.e., `0xf0`.
>> - `checked_cast<int8_t>(0xf0)` casts `0xf0` to an int8_t value, which is `-16`.
>> - Casting this int8_t `-16` back to unsigned int results in `0xfffffff0`.
>> - The check compares `0xf0` to `0xfffffff0`, which obviously fails.
>> 
>> 2. Additionally, the encoding of the negative floating-point number is incorrect:
>> - The imm8 field can fall outside the valid range of **[-128, 127]**.
>> - Bit **13** should be encoded as **0** for floating-point numbers.
>> 
>> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
>> 
>> Some test cases are added to aarch64-asmtest.py, and all tests passed.
>
> Thanks.
> 
> I'm not convinced that the refactoring is necessary. Why not write a replacement for `checked_cast<int8_t>(pack(d))` that does the right thing and fix the first `sve_cpy()` so that it does the right thing for float args?

Thanks @theRealAph .

I've indeed considered and implemented your idea. The code diff:

diff --git a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
index 11d302e9026..841d24f516b 100644
--- a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
+++ b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
@@ -3813,8 +3813,9 @@ template<typename R, typename... Rx>
                bool isMerge, bool isFloat) {
     starti;
     assert(T != Q, "invalid size");
+    assert((!isFloat) || (isFloat && T != B), "invalid size");
     int sh = 0;
-    if (imm8 <= 127 && imm8 >= -128) {
+    if ((imm8 <= 127 && imm8 >= -128) || (isFloat && (imm8 >> 8) == 0)) {
       sh = 0;
     } else if (T != B && imm8 <= 32512 && imm8 >= -32768 && (imm8 & 0xff) == 0) {
       sh = 1;
@@ -3824,7 +3825,7 @@ template<typename R, typename... Rx>
     }
     int m = isMerge ? 1 : 0;
     f(0b00000101, 31, 24), f(T, 23, 22), f(0b01, 21, 20);
-    prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), sf(imm8, 12, 5), rf(Zd, 0);
+    prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), f(imm8&0xff, 12, 5), rf(Zd, 0);
   }

 public:
@@ -3834,7 +3835,7 @@ template<typename R, typename... Rx>
   }
   // SVE copy floating-point immediate to vector elements (predicated)
   void sve_cpy(FloatRegister Zd, SIMD_RegVariant T, PRegister Pg, double d) {
-    sve_cpy(Zd, T, Pg, checked_cast<int8_t>(pack(d)), /*isMerge*/true, /*isFloat*/true);
+    sve_cpy(Zd, T, Pg, checked_cast<uint8_t>(pack(d)), /*isMerge*/true, /*isFloat*/true);
   }

   // SVE conditionally select elements from two vectors


However, some of my colleagues have differing opinions:
1. sve `cpy` and `fcpy` are actually two different instructions, and distinguishing them might be clearer.
2. sve `cpy` 's imm8 is an **int** , while `fcpy` 's imm8 is an **fp8** . While some encoding code can be reused, separating the encodings makes the code clearer.

I think both implementations are fine. If you think it's better to not refactor, I'll revert.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3243633607