RFR: 8286823: Default to UseAVX=2 on all Skylake/Cascade Lake CPUs

olivergillespie duke at openjdk.java.net
Thu May 19 18:20:00 UTC 2022


On Mon, 16 May 2022 15:52:22 GMT, olivergillespie <duke at openjdk.java.net> wrote:

> The current code already does this for 'older' Skylake processors,
> namely those with _stepping < 5. My testing indicates this is a
> problem for later processors in this family too, so I have removed the
> max stepping condition.
> 
> The original exclusion was added in https://bugs.openjdk.java.net/browse/JDK-8221092.
> 
> A general description of the overall issue is given at
> https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Downclocking.
> 
> According to https://en.wikichip.org/wiki/intel/microarchitectures/cascade_lake#CPUID,
> stepping values 5..7 indicate Cascade Lake. I have tested on a CPU with stepping=7,
> and I see CPU frequency reduction from 3.1GHz down to 2.7GHz (~23%) when using
> -XX:UseAVX=3, along with a corresponding performance reduction.
> 
> I first saw this issue in a real production workload, where the main AVX3 instructions
> being executed were those generated for various flavours of disjoint_arraycopy.
> 
> I can reproduce a similar effect using SPECjvm2008's xml.transform benchmark.
> 
> 
> java --add-opens=java.xml/com.sun.org.apache.xerces.internal.parsers=ALL-UNNAMED \
> --add-opens=java.xml/com.sun.org.apache.xerces.internal.util=ALL-UNNAMED \
> -jar SPECjvm2008.jar -ikv -ict xml.transform
> 
> 
> Before the change, or with -XX:UseAVX=3:
> 
> 
> Valid run!
> Score on xml.transform: 776.00 ops/m
> 
> 
> After the change, or with -XX:UseAVX=2:
> 
> 
> Valid run!
> Score on xml.transform: 894.07 ops/m
> 
> 
> So, a 15% improvement in this benchmark. It's possible some benchmarks will be negatively
> affected by this change, but I contend that this is still the right move given the stark
> difference in this benchmark combined with the fact that use of AVX3 instructions can
> affect *all* processes/code on the host due to the downclocking, and the fact that this
> effect is very hard to root-cause, for example CPU profiles look very similar before and
> after since all code is equally slowed.

Thanks for the comments.

> From what I understand, only the core which is executing 512 bit vector instructions will observe this lower frequency and not the entire processor.

Yes, but in a typical multithreaded java service (say a web app), many threads end up using these instructions via common operations like StringBuilder, plus the threads are not pinned to cores. This in practice can mean all cores occasionally hit AVX3 instructions. The slowdown on these processors does not only last while the instruction is executing, it persists far beyond that, and so even a few short uses of AVX3 can end up perma-throttling all CPUs.

>From https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/
> Downclocking, when it happens, is per core and for a short time after you have used particular instructions (e.g., ~2ms).

2 milliseconds is a huge amount of time to run at the lower frequency. Each core only needs to hit these instructions once every 2 milliseconds to have permanent throttling, and that's what I see on my applications.

More info: https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/

I'm not an expert on the mechanics of this, only that I observe this same behaviour in every real-world JDK17 application I've looked at which run on Cascade Lake (4 so far); namely 15% global slowdown with very little AVX3 usage. The AVX3 speedup definitely doesn't make up for it, because when I disable AVX3 my application's performance improves significantly (reduced latency, increased throughput).

Do you know what measurements were used to justify the original exception or model 85 stepping <5? I could re-run those tests to compare to my hardware. Is there any other bechmark or measurement which you'd like to see which might justify this change?

Maybe we can look at the change in this way: are model 85 stepping 5,6,7 affected by the same issue that the earlier steppings are, which are already excluded from AVX3 by default? It was evidently decided that the issue was severe enough for those CPUs to have AVX3 disabled by default, I merely find that 3 later models also suffer severely from this issue and should receive the same treatment.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8731


More information about the hotspot-dev mailing list