RFR: 8286823: Default to UseAVX=2 on all Skylake/Cascade Lake CPUs
olivergillespie
duke at openjdk.java.net
Tue May 17 17:36:48 UTC 2022
On Tue, 17 May 2022 17:18:01 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:
>> The current code already does this for 'older' Skylake processors,
>> namely those with _stepping < 5. My testing indicates this is a
>> problem for later processors in this family too, so I have removed the
>> max stepping condition.
>>
>> The original exclusion was added in https://bugs.openjdk.java.net/browse/JDK-8221092.
>>
>> A general description of the overall issue is given at
>> https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Downclocking.
>>
>> According to https://en.wikichip.org/wiki/intel/microarchitectures/cascade_lake#CPUID,
>> stepping values 5..7 indicate Cascade Lake. I have tested on a CPU with stepping=7,
>> and I see CPU frequency reduction from 3.1GHz down to 2.7GHz (~23%) when using
>> -XX:UseAVX=3, along with a corresponding performance reduction.
>>
>> I first saw this issue in a real production workload, where the main AVX3 instructions
>> being executed were those generated for various flavours of disjoint_arraycopy.
>>
>> I can reproduce a similar effect using SPECjvm2008's xml.transform benchmark.
>>
>>
>> java --add-opens=java.xml/com.sun.org.apache.xerces.internal.parsers=ALL-UNNAMED \
>> --add-opens=java.xml/com.sun.org.apache.xerces.internal.util=ALL-UNNAMED \
>> -jar SPECjvm2008.jar -ikv -ict xml.transform
>>
>>
>> Before the change, or with -XX:UseAVX=3:
>>
>>
>> Valid run!
>> Score on xml.transform: 776.00 ops/m
>>
>>
>> After the change, or with -XX:UseAVX=2:
>>
>>
>> Valid run!
>> Score on xml.transform: 894.07 ops/m
>>
>>
>> So, a 15% improvement in this benchmark. It's possible some benchmarks will be negatively
>> affected by this change, but I contend that this is still the right move given the stark
>> difference in this benchmark combined with the fact that use of AVX3 instructions can
>> affect *all* processes/code on the host due to the downclocking, and the fact that this
>> effect is very hard to root-cause, for example CPU profiles look very similar before and
>> after since all code is equally slowed.
>
> src/hotspot/cpu/x86/vm_version_x86.cpp line 900:
>
>> 898: // Don't use AVX-512 on Skylake (or the related Cascade Lake) CPUs unless explicitly
>> 899: // requested - these instructions can cause performance issues on these processors.
>> 900: if (use_avx_limit > 2 && is_intel_skylake()) {
>
> Maybe `is_intel_skylake` needs to be changed to `is_cpu_model_intel_skylake`? It will make clear that all CPUs based on Skylake model are excluded.
I agree it's not necessarily a perfect name, but I haven't changed its behaviour so I figured I can avoid making my change any bigger than necessary. The intention of this particular usage is clarified in my comment. It's used in [another place](https://github.com/openjdk/jdk/blob/d77e5680af382f8215b5b1f9cd4754056bccc9e5/src/hotspot/cpu/x86/vm_version_x86.hpp#L1083) too, where it's evidently understood to include Cascade lake since the comment mentions Ice lake + (Cascade lake successor).
-------------
PR: https://git.openjdk.java.net/jdk/pull/8731
More information about the hotspot-dev
mailing list