RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5]

Quan Anh Mai qamai at openjdk.org
Sat Apr 6 01:47:24 UTC 2024


On Fri, 5 Apr 2024 22:30:35 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> @vnkozlov  If I understand the proposal from @merykitty correctly, the suggestion is to reserve xmm15 as non allocatable throughout. This sounds like a big overhead for cases where every xmm register is usable say in a Vector API kernel. From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts.
>
>> @sviswa7
>> 
>> Yes with my proposal we are losing 1 out of 16 registers, which is a cost. But emitting an additional instruction for every conversion from integer to floating point values is also a cost. A more conservative solution is to use the last register in the allocation chunk which will often be unused, and when it is used, the function should be crowded with other instructions such that this particular dependency will not have a profound effect.
>> 
>> > From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts.
>> 
>> You cannot reach that conclusion, we are trading off here, and this benchmark is chosen because it is bottlenecked by that particular dependency. The situation may not be the same for the other cases.
>> 
>> Cheers, Quan Anh
> 
> @merykitty I would like to disagree, decision to reserve a register for entire duration of program cannot be taken lightly.

@sviswa7 I didn't disagree with you, I just made a more conservative proposal that uses `xmm15` here without reserving it, what do you think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040850196


More information about the hotspot-compiler-dev mailing list