RFR: 8286823: Default to UseAVX=2 on all Skylake/Cascade Lake CPUs
olivergillespie
duke at openjdk.java.net
Fri May 20 09:55:57 UTC 2022
On Mon, 16 May 2022 15:52:22 GMT, olivergillespie <duke at openjdk.java.net> wrote:
> The current code already does this for 'older' Skylake processors,
> namely those with _stepping < 5. My testing indicates this is a
> problem for later processors in this family too, so I have removed the
> max stepping condition.
>
> The original exclusion was added in https://bugs.openjdk.java.net/browse/JDK-8221092.
>
> A general description of the overall issue is given at
> https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Downclocking.
>
> According to https://en.wikichip.org/wiki/intel/microarchitectures/cascade_lake#CPUID,
> stepping values 5..7 indicate Cascade Lake. I have tested on a CPU with stepping=7,
> and I see CPU frequency reduction from 3.1GHz down to 2.7GHz (~23%) when using
> -XX:UseAVX=3, along with a corresponding performance reduction.
>
> I first saw this issue in a real production workload, where the main AVX3 instructions
> being executed were those generated for various flavours of disjoint_arraycopy.
>
> I can reproduce a similar effect using SPECjvm2008's xml.transform benchmark.
>
>
> java --add-opens=java.xml/com.sun.org.apache.xerces.internal.parsers=ALL-UNNAMED \
> --add-opens=java.xml/com.sun.org.apache.xerces.internal.util=ALL-UNNAMED \
> -jar SPECjvm2008.jar -ikv -ict xml.transform
>
>
> Before the change, or with -XX:UseAVX=3:
>
>
> Valid run!
> Score on xml.transform: 776.00 ops/m
>
>
> After the change, or with -XX:UseAVX=2:
>
>
> Valid run!
> Score on xml.transform: 894.07 ops/m
>
>
> So, a 15% improvement in this benchmark. It's possible some benchmarks will be negatively
> affected by this change, but I contend that this is still the right move given the stark
> difference in this benchmark combined with the fact that use of AVX3 instructions can
> affect *all* processes/code on the host due to the downclocking, and the fact that this
> effect is very hard to root-cause, for example CPU profiles look very similar before and
> after since all code is equally slowed.
Thanks for running the tests and sharing the results!
> Based on your comment, you are using JDK 17. Which particular version you have?
My tests were run on the latest JDK19 tip build, but my real applications where I have observed the problems are using 17.0.3.6.
I agree that results are mixed (though the biggest changes are in favour of AVX2), so given that we know for sure that AVX512 affects other processes and other code, mixed results is definitely not a good enough reason to *enable* AVX512 on these CPUs. I don't believe we would ever decide to *enable* AVX512 based on these results, hence my suggested change - don't enable AVX512 on these CPUs. There is no benefit on average to having AVX512 enabled on these models, yet it comes with huge downsides, so I think the risk profile leans heavily in favour of not enabling it by default for these models.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8731
More information about the hotspot-dev
mailing list