RFR: 8283232: x86: Improve vector broadcast operations [v2]
Jatin Bhateja
jbhateja at openjdk.java.net
Wed Mar 16 13:20:45 UTC 2022
On Wed, 16 Mar 2022 05:55:18 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:
>> Hi,
>>
>> This patch improves the generation of broadcasting a scalar in several ways:
>>
>> - Avoid potential data bypass delay which can be observed on some platforms by using the correct type of instruction if it does not require extra instructions.
>> - As it has been pointed out, dumping the whole vector into the constant table is costly in terms of code size, this patch minimises this overhead for vector replicate of constants. Also, options are available for constants to be generated with more alignment so that vector load can be made efficiently without crossing cache lines.
>> - Vector broadcasting should prefer rematerialising to spilling when register pressure is high.
>>
>> This patch also removes some redundant code paths and rename some incorrectly named instructions.
>>
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
>
> fix crash in sse
Will be helpful if a JMH can be created around this, following is the except from X86 Optimizations manual Appendix E Section E.1.3
"Forwarding the result within the same bypass domain from a producer micro-op to a consumer micro is done efficiently in hardware without delay"
-------------
PR: https://git.openjdk.java.net/jdk/pull/7832
More information about the hotspot-compiler-dev
mailing list