RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v6]

Bhavana Kilambi bkilambi at openjdk.org
Thu Aug 14 10:27:14 UTC 2025


On Thu, 14 Aug 2025 09:09:32 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

>> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test -
>> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as -
>> 
>> 
>> public void vectorAddConstInputFloat16() {
>>          for (int i = 0; i < LEN; ++i) {
>>              output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST));
>>          }
>>      }
>> 
>> 
>> 
>> <The full failure log is present in the JBS ticket, thus not reproducing it here>
>> 
>> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates.
>> 
>> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node).
>> 
>> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine.
>
> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Modify loadConH to use a mov and fmov instead

Tested the latest patch on Graviton3 and both the JTREG tests pass - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java`

The code now generates the following for `loadConH` (regs `s11`, `z10` taken as examples)- 

mov rscratch1, #imm
fmov s11, rscratch1


This loaded value might be used by any scalar iterations following the `fmov`.


For the vectorized loop, if the dup is legal - 
`dup z10.h, #imm (`replicateHF_imm8_gt128b` machnode)
`
and for illegal immediates (`replicateHF` machnode) - 
`dup z10.h, h11 
`

@theRealAph could I please ask for another round of review? Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3187936230


More information about the hotspot-compiler-dev mailing list