RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4]

Aleksey Shipilev shade at openjdk.org
Mon Aug 11 09:14:25 UTC 2025


On Mon, 11 Aug 2025 07:54:53 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

>> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test -
>> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as -
>> 
>> 
>> public void vectorAddConstInputFloat16() {
>>          for (int i = 0; i < LEN; ++i) {
>>              output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST));
>>          }
>>      }
>> 
>> 
>> 
>> <The full failure log is present in the JBS ticket, thus not reproducing it here>
>> 
>> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates.
>> 
>> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node).
>> 
>> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine.
>
> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Addressed review comments and modified some comments

This looks reasonable to me, thanks. Some nits in the test remain, but they are non-blocking.

test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 45:

> 43: public class TestFloat16Replicate {
> 44:     private static short[] input;
> 45:     private static short[] output;

This might give things even more chance to vectorize? Not sure, feel free to ignore.

Suggestion:

    private static final short[] INPUTE;
    private static final short[] OUTPUT;

test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java line 47:

> 45:     private static short[] output;
> 46: 
> 47:    // Choose FP16_IN_RANGE which is within the range of [-128 << 8, 127 << 8] and a multiple of 256

Suggestion:

    // Choose FP16_IN_RANGE which is within the range of [-128 << 8, 127 << 8] and a multiple of 256

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26589#pullrequestreview-3104792951
PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2266101795
PR Review Comment: https://git.openjdk.org/jdk/pull/26589#discussion_r2266100000


More information about the hotspot-compiler-dev mailing list