RFR: 8287697: Limit auto vectorization to 32-byte vector on Cascade Lake

Jatin Bhateja jbhateja at openjdk.java.net
Wed Jun 1 23:36:03 UTC 2022


On Wed, 25 May 2022 01:48:16 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

> We observe ~20% regression in SPECjvm2008 mpegaudio sub benchmark on Cascade Lake with Default vs -XX:UseAVX=2.
> The performance of all the other non-startup sub benchmarks of SPECjvm2008 is within +/- 5%. 
> The performance regression is due to auto-vectorization of small loops. 
> We don’t have AVX3Threshold consideration in auto-vectorization. 
> The performance regression in mpegaudio can be recovered by limiting auto-vectorization to 32-byte vectors.
> 
> This PR limits auto-vectorization to 32-byte vectors by default on Cascade Lake. Users can override this by either setting -XX:UseAVX=3 or -XX:SuperWordMaxVectorSize=64 on JVM command line.
> 
> Please review.
> 
> Best Regard,
> Sandhya

Vectorization through SLP can be controlled by constraining MaxVectorSize and through Vector APIs using narrower SPECIES. 
Can you kindly share more details on need for a separate SuperWordMaxVectorSize here. User already has all the necessary controls to limit C2 vector length, it will rarely happen that one want to emit 512 vector code using vector APIs and still limit auto-vectorizer to infer 256 bit vector operations and vice-versa. May be we should pessimistically just constrain the vector size of those loops which may result into AVX512 heavy instructions through a target specific analysis pass.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8877


More information about the hotspot-dev mailing list