RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE

Bhavana Kilambi bkilambi at openjdk.org
Fri Aug 1 12:07:54 UTC 2025


On Fri, 1 Aug 2025 09:31:40 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test -
> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as -
> 
> 
> public void vectorAddConstInputFloat16() {
>          for (int i = 0; i < LEN; ++i) {
>              output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST));
>          }
>      }
> 
> 
> 
> <The full failure log is present in the JBS ticket, thus not reproducing it here>
> 
> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates.
> 
> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node).
> 
> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine.

> I am still a bit confused what matches `Replicate` with `immH` that does _not_ fit `immH8_shift8` when `Matcher::vector_length_in_bytes(n) > 16`?

Hi, thanks for your review. If the immediate value does not fit `immH8_shift8` for `Matcher::vector_length_in_bytes(n) > 16` , the compiler would generate `loadConH` [1] -> `replicateHF` [2] backend nodes instead. The constant would be loaded from the constant pool instead and then broadcasted/replicated to every lane of an SVE register. 

[1] https://github.com/openjdk/jdk/blob/8ac4a88f3c5ad57824dd192cb3f0af5e71cbceeb/src/hotspot/cpu/aarch64/aarch64.ad#L6963

[2] https://github.com/openjdk/jdk/blob/8ac4a88f3c5ad57824dd192cb3f0af5e71cbceeb/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4806

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3144344435


More information about the hotspot-compiler-dev mailing list