RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative floats
erifan
duke at openjdk.org
Tue Sep 2 03:04:46 UTC 2025
On Mon, 1 Sep 2025 14:35:40 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> The sve_cpy instruction is not correctly implemented for negative floating-point values. The issues include:
>>
>> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))` check fails. For example, assume `d = -1.0`:
>> - `pack(-1.0)` returns an unsigned int with the 7th bit set, i.e., `0xf0`.
>> - `checked_cast<int8_t>(0xf0)` casts `0xf0` to an int8_t value, which is `-16`.
>> - Casting this int8_t `-16` back to unsigned int results in `0xfffffff0`.
>> - The check compares `0xf0` to `0xfffffff0`, which obviously fails.
>>
>> 2. Additionally, the encoding of the negative floating-point number is incorrect:
>> - The imm8 field can fall outside the valid range of **[-128, 127]**.
>> - Bit **13** should be encoded as **0** for floating-point numbers.
>>
>> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
>>
>> Some test cases are added to aarch64-asmtest.py, and all tests passed.
>
> Thanks.
>
> I'm not convinced that the refactoring is necessary. Why not write a replacement for `checked_cast<int8_t>(pack(d))` that does the right thing and fix the first `sve_cpy()` so that it does the right thing for float args?
Thanks @theRealAph .
I've indeed considered and implemented your idea. The code diff:
diff --git a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
index 11d302e9026..841d24f516b 100644
--- a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
+++ b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
@@ -3813,8 +3813,9 @@ template<typename R, typename... Rx>
bool isMerge, bool isFloat) {
starti;
assert(T != Q, "invalid size");
+ assert((!isFloat) || (isFloat && T != B), "invalid size");
int sh = 0;
- if (imm8 <= 127 && imm8 >= -128) {
+ if ((imm8 <= 127 && imm8 >= -128) || (isFloat && (imm8 >> 8) == 0)) {
sh = 0;
} else if (T != B && imm8 <= 32512 && imm8 >= -32768 && (imm8 & 0xff) == 0) {
sh = 1;
@@ -3824,7 +3825,7 @@ template<typename R, typename... Rx>
}
int m = isMerge ? 1 : 0;
f(0b00000101, 31, 24), f(T, 23, 22), f(0b01, 21, 20);
- prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), sf(imm8, 12, 5), rf(Zd, 0);
+ prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), f(imm8&0xff, 12, 5), rf(Zd, 0);
}
public:
@@ -3834,7 +3835,7 @@ template<typename R, typename... Rx>
}
// SVE copy floating-point immediate to vector elements (predicated)
void sve_cpy(FloatRegister Zd, SIMD_RegVariant T, PRegister Pg, double d) {
- sve_cpy(Zd, T, Pg, checked_cast<int8_t>(pack(d)), /*isMerge*/true, /*isFloat*/true);
+ sve_cpy(Zd, T, Pg, checked_cast<uint8_t>(pack(d)), /*isMerge*/true, /*isFloat*/true);
}
// SVE conditionally select elements from two vectors
However, some of my colleagues have differing opinions:
1. sve `cpy` and `fcpy` are actually two different instructions, and distinguishing them might be clearer.
2. sve `cpy` 's imm8 is an **int** , while `fcpy` 's imm8 is an **fp8** . While some encoding code can be reused, separating the encodings makes the code clearer.
I think both implementations are fine. If you think it's better to not refactor, I'll revert.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3243633607
More information about the hotspot-compiler-dev
mailing list