RFR: 8283232: x86: Improve vector broadcast operations [v8]

Tue Jul 26 12:32:16 UTC 2022

On Tue, 26 Jul 2022 08:04:55 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1662:
>> 
>>> 1660:       case 64: vmovups(dst, src, Assembler::AVX_512bit); break;
>>> 1661:       default: ShouldNotReachHere();
>>> 1662:     }
>> 
>> Vector Load/store from memory happens from dedicated ports, can you elaborate why this change will benefit.
>
>> Vector Load/store from memory happens from dedicated ports, can you elaborate why this change will benefit.
> 
> Above reference to section 3.5.5.2 also states that FP loads adds another cycle of latency,  but saving the cycles penalty due to bypass b/w FP and SIMD domains still holds good. So may be for load there is no pressing need and existing load vector handling can be kept as it is.
> 
> Overall savings from constant table size reductions are very impressive.  Thanks.

Thanks for your sharing, I have reverted the change here

-------------

PR: https://git.openjdk.org/jdk/pull/7832