RFR: 8353558: x86: Use CLFLUSHOPT/CLWB/CPUID for ICache sync

Aleksey Shipilev shade at openjdk.org
Tue Apr 15 10:51:01 UTC 2025


For Leyden, that wants to load a lot of code as fast as it can, code cache flush costs are now significant part of the picture. There are single-digit percent startup time opportunities in better ICache syncs.

It is not sufficiently clear why icache flushes are needed for x86. Intel/AMD manuals say the instruction caches are fully coherent. GCC intrinsic for `__builtin___clear_cache` is empty. It looks that a single serializing instruction like `cpuid` might be OK for the entire flush to happen, this is what our `OrderAccess::cross_modify_fence` does. Still, we can maintain the old behavior by flushing the caches smarter: there are CLFLUSHOPT and CLWB available on modern x86.

See more discussion and references in the RFE. The performance data is in the comments in this PR.

Additional testing:
 - [x] Linux x86_64 server fastdebug, `all`
 - [x] Linux x86_64 server fastdebug, `all` + `X86ICacheSync={0,1,2,3,4}`

-------------

Commit messages:
 - Fix cpuid
 - Also single-CPUID mode
 - Yank AMD fix
 - Not even fences for =0
 - Fix

Changes: https://git.openjdk.org/jdk/pull/24389/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24389&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8353558
  Stats: 208 lines in 18 files changed: 173 ins; 7 del; 28 mod
  Patch: https://git.openjdk.org/jdk/pull/24389.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24389/head:pull/24389

PR: https://git.openjdk.org/jdk/pull/24389


More information about the hotspot-dev mailing list