RFR: 8286823: Default to UseAVX=2 on all Skylake/Cascade Lake CPUs

Volker Simonis simonis at openjdk.java.net
Fri May 20 11:55:52 UTC 2022


On Fri, 20 May 2022 05:30:54 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> The current code already does this for 'older' Skylake processors,
>> namely those with _stepping < 5. My testing indicates this is a
>> problem for later processors in this family too, so I have removed the
>> max stepping condition.
>> 
>> The original exclusion was added in https://bugs.openjdk.java.net/browse/JDK-8221092.
>> 
>> A general description of the overall issue is given at
>> https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Downclocking.
>> 
>> According to https://en.wikichip.org/wiki/intel/microarchitectures/cascade_lake#CPUID,
>> stepping values 5..7 indicate Cascade Lake. I have tested on a CPU with stepping=7,
>> and I see CPU frequency reduction from 3.1GHz down to 2.7GHz (~23%) when using
>> -XX:UseAVX=3, along with a corresponding performance reduction.
>> 
>> I first saw this issue in a real production workload, where the main AVX3 instructions
>> being executed were those generated for various flavours of disjoint_arraycopy.
>> 
>> I can reproduce a similar effect using SPECjvm2008's xml.transform benchmark.
>> 
>> 
>> java --add-opens=java.xml/com.sun.org.apache.xerces.internal.parsers=ALL-UNNAMED \
>> --add-opens=java.xml/com.sun.org.apache.xerces.internal.util=ALL-UNNAMED \
>> -jar SPECjvm2008.jar -ikv -ict xml.transform
>> 
>> 
>> Before the change, or with -XX:UseAVX=3:
>> 
>> 
>> Valid run!
>> Score on xml.transform: 776.00 ops/m
>> 
>> 
>> After the change, or with -XX:UseAVX=2:
>> 
>> 
>> Valid run!
>> Score on xml.transform: 894.07 ops/m
>> 
>> 
>> So, a 15% improvement in this benchmark. It's possible some benchmarks will be negatively
>> affected by this change, but I contend that this is still the right move given the stark
>> difference in this benchmark combined with the fact that use of AVX3 instructions can
>> affect *all* processes/code on the host due to the downclocking, and the fact that this
>> effect is very hard to root-cause, for example CPU profiles look very similar before and
>> after since all code is equally slowed.
>
> I think I spent enough time on this already. Performance tracking is "rabbit hole" :(
> 
> As I said before, results are mixed comparing to running on Skylake where results were all positive.
> Even running on more recent Intel's CPUs I see some mixed results.
> 
> Yes, AVXV512 is not for all applications. I agree with your observations. Even so, I can't support this change.
> For such cases we have official solution by providing ability to choose instructions set with `UseAVX` flag.

@vnkozlov I think I agree with @olivergillespie. Looking at the graphs you've posted, AVX2 seems superior to AVX3 even if only 1/4 of the threads are used but clearly if the machine is fully loaded. As @olivergillespie said, it would be hard to justify enabling AVX3 on these CPUs today, given these results.

I would argue we should disable it by default and as you said, let the few use cases which benefit from it like AES on non-loaded machines, enable it manually with `-XX:+UseAVX=3`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8731


More information about the hotspot-dev mailing list