RFR: 8287697: Limit auto vectorization to 32-byte vector on Cascade Lake [v4]

Mon Jun 6 20:55:13 UTC 2022

On Thu, 2 Jun 2022 17:49:04 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> We observe ~20% regression in SPECjvm2008 mpegaudio sub benchmark on Cascade Lake with Default vs -XX:UseAVX=2.
>> The performance of all the other non-startup sub benchmarks of SPECjvm2008 is within +/- 5%. 
>> The performance regression is due to auto-vectorization of small loops. 
>> We don’t have AVX3Threshold consideration in auto-vectorization. 
>> The performance regression in mpegaudio can be recovered by limiting auto-vectorization to 32-byte vectors.
>> 
>> This PR limits auto-vectorization to 32-byte vectors by default on Cascade Lake. Users can override this by either setting -XX:UseAVX=3 or -XX:SuperWordMaxVectorSize=64 on JVM command line.
>> 
>> Please review.
>> 
>> Best Regard,
>> Sandhya
>
> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review comment resolution

jbb2015 are only left in queue for performance testing. It may take time and I don't expect much variations in them.

Testing also include `MaxVectorSize=32` to compare with current changes. It shows slightly (1-3%) better results in some `Crypto-AESBench_decrypt/encrypt` sub-benchmarks but it could be due to variations we observed in them. On other hand `SuperWordMaxVectorSize=32` shows better results in some Renaissance  sub-benchmarks - actually it keep scores similar to current code and `MaxVectorSize=32` gives regression in them. Based on this I agree with current changes vs setting `MaxVectorSize=32`.

Both changes gives 4-5% improvement to `SPECjvm2008-MPEG`.

But I also observed 2.7% regression in `SPECjvm2008-SOR.small` with ParallelGC. For both types of changes.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8877