RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5]
Emanuel Peter
epeter at openjdk.org
Mon Jan 12 08:50:46 UTC 2026
On Mon, 12 Jan 2026 08:13:04 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:
>> test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 459:
>>
>>> 457: short result = (short) 0;
>>> 458: for (int i = 0; i < LEN; i++) {
>>> 459: result = float16ToRawShortBits(add(shortBitsToFloat16(result), shortBitsToFloat16(input1[i])));
>>
>> Why all the conversions from and to `short` / `Float16`?
>> Is there any benefit to use `short` for the intermediate results? Why not make `result` a `Float16`?
>
> If I remember correctly, I tried doing that initially but the loop did not get vectorized. The Ideal graph showed there were a lot of nodes related to object creation (probably for the intermediate `Float16` result) which bloated the size of the loop resulting in the loop not getting unrolled (and eventually not vectorized). I also tried a standalone loop where I do not return the intermediate result hoping that escape analysis could help in avoiding the object creation but did not help either.
Hmm, I see. That sounds like a deficiency in the auto unboxing of Float16.
Suggestion: You should create both variants of the IR tests. And then file an RFE for the one that does not yet vectorize because of the boxing issues.
Because the way things are now, it's not a huge win, to be honest. Which user is supposed to write their code in such a convoluted way, having to cast back and forth? Would they not expect they could just use Float16 all the way through?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2681318247
More information about the hotspot-compiler-dev
mailing list