RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5]

Quan Anh Mai qamai at openjdk.org
Fri Apr 5 20:14:10 UTC 2024


On Fri, 5 Apr 2024 18:17:00 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> My new testing passed. 
>> But I want to hear an answer to @merykitty suggestion about using xmm15.
>
> @vnkozlov  If I understand the proposal from @merykitty correctly, the suggestion is to reserve xmm15 as non allocatable throughout. This sounds like a big overhead for cases where every xmm register is usable say in a Vector API kernel. From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts.

@sviswa7

Yes with my proposal we are losing 1 out of 16 registers, which is a cost. But emitting an additional instruction for every conversion from integer to floating point values is also a cost. A more conservative solution is to use the last register in the allocation chunk which will often be unused, and when it is used, the function should be crowded with other instructions such that this particular dependency will not have a profound effect.

> From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts.

You cannot reach that conclusion, we are trading off here, and this benchmark is chosen because it is bottlenecked by that particular dependency. The situation may not be the same for the other cases.

Cheers,
Quan Anh

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040556663


More information about the hotspot-compiler-dev mailing list