RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4]

Fri Apr 5 06:18:08 UTC 2024

On Fri, 5 Apr 2024 05:57:27 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> This is a downcast from double precision to single precision value, thus only lower 32 bits of destination hold the actual results for conversion, upper 127:32 bits are copied from non destructive source operand for vex encoded instruction.
>> 
>> VCVTSD2SS (VEX.128 Encoded Version) ¶
>> DEST[31:0] := Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC2[63:0]);
>> DEST[127:32] := SRC1[127:32]
>> DEST[MAXVL-1:128] := 0
>> 
>> User is only interested in lower 32 bit of destination and passing source as NDS will prevent false dependency for AVX targets since instruction dispatch will not be held for false dependency anymore and will be issued to OOO backend the moment source is ready
>
> This change modifies the defined behaviours of `cvtss2sd`. Without AVX, it would retains the bits 64-127 of `dst` while with it the bits would be copied from `src`. I would suggest separating the matching rules instead.

Its a cleaver trick to dodge false dependency without compromising on correctness.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1552980667