RFR: 8318562: Computational test more than 2x slower when AVX instructions are used

Fri Nov 17 02:21:29 UTC 2023

On Fri, 17 Nov 2023 02:16:58 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> I confirmed that this change solved performance issue on machines I tested (old Broadwell and Cascade Lake CPUs).
>> I am submitting our regular testing for approval.
>
> @vnkozlov Thanks a lot!

Hi @sviswa7 , 

Thanks for addressing this.

For SSE versions Bits (MAXVL-1:32) of the corresponding destination register remain unchanged

For AVX Bits (127:32) of the XMM register destination are copied from corresponding bits in the first source operand.

Since all the computation in backend intially happens in logical register and only at retirement backend copies the logical register to architectural register, thus from micro architectural standpoint both the cases will result in emittion of extra micro op, in first case for merging and in second case for copying. Which is why we inject vzeroupper to save merging plenalities b/w AVX2/AVX512 and SSE transitions.

Fix looks ok to me otherwise.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16701#issuecomment-1815645570