RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5]
Sandhya Viswanathan
sviswanathan at openjdk.org
Fri Apr 5 22:33:19 UTC 2024
On Fri, 5 Apr 2024 18:17:00 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> My new testing passed.
>> But I want to hear an answer to @merykitty suggestion about using xmm15.
>
> @vnkozlov If I understand the proposal from @merykitty correctly, the suggestion is to reserve xmm15 as non allocatable throughout. This sounds like a big overhead for cases where every xmm register is usable say in a Vector API kernel. From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts.
> @sviswa7
>
> Yes with my proposal we are losing 1 out of 16 registers, which is a cost. But emitting an additional instruction for every conversion from integer to floating point values is also a cost. A more conservative solution is to use the last register in the allocation chunk which will often be unused, and when it is used, the function should be crowded with other instructions such that this particular dependency will not have a profound effect.
>
> > From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts.
>
> You cannot reach that conclusion, we are trading off here, and this benchmark is chosen because it is bottlenecked by that particular dependency. The situation may not be the same for the other cases.
>
> Cheers, Quan Anh
@merykitty I would like to disagree, decision to reserve a register for entire duration of program cannot be taken lightly.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040711784
More information about the hotspot-compiler-dev
mailing list