RFR: 8283232: x86: Improve vector broadcast operations [v8]
Quan Anh Mai
duke at openjdk.org
Tue Jul 26 12:32:16 UTC 2022
On Tue, 26 Jul 2022 08:04:55 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1662:
>>
>>> 1660: case 64: vmovups(dst, src, Assembler::AVX_512bit); break;
>>> 1661: default: ShouldNotReachHere();
>>> 1662: }
>>
>> Vector Load/store from memory happens from dedicated ports, can you elaborate why this change will benefit.
>
>> Vector Load/store from memory happens from dedicated ports, can you elaborate why this change will benefit.
>
> Above reference to section 3.5.5.2 also states that FP loads adds another cycle of latency, but saving the cycles penalty due to bypass b/w FP and SIMD domains still holds good. So may be for load there is no pressing need and existing load vector handling can be kept as it is.
>
> Overall savings from constant table size reductions are very impressive. Thanks.
Thanks for your sharing, I have reverted the change here
-------------
PR: https://git.openjdk.org/jdk/pull/7832
More information about the hotspot-compiler-dev
mailing list