RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4]

Fri Apr 5 20:18:10 UTC 2024

On Fri, 5 Apr 2024 15:55:01 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:

>> @jatin-bhateja I get it but IMO it shouldn't be the responsibility of the assembler to do that, the assembler should emit machine code in a manner that respects what is being written.
>
>> This is a downcast from double precision to single precision value, thus only lower 32 bits of destination hold the actual results for conversion, upper 127:32 bits are copied from non destructive source operand for vex encoded instruction.
> 
> Please see the updated description incorporating the correction dst[63:0] -> dst[31,0] for `cvtss2sd`

@vamsi-parasa

> This change modifies the defined behaviours of cvtss2sd. Without AVX, it would retains the bits 64-127 of dst while with it the bits would be copied from src. I would suggest separating the matching rules instead.

Please address this, fyi in similar cases we created separate methods in the `MacroAssembler` such as `movflt` or `movdbl`. Feel free to disagree but I think the assembler should not behave differently compared to the corresponding assembly instruction.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1554255271