RFR: 8353558: x86: Use better instructions for ICache sync when available [v4]

Aleksey Shipilev shade at openjdk.org
Thu Apr 24 07:01:09 UTC 2025


On Wed, 23 Apr 2025 10:01:41 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> For Leyden, that wants to load a lot of code as fast as it can, code cache flush costs are now significant part of the picture. There are single-digit percent startup time opportunities in better ICache syncs.
>> 
>> It is not sufficiently clear why icache flushes are needed for x86. Intel/AMD manuals say the instruction caches are fully coherent. GCC intrinsic for `__builtin___clear_cache` is empty. It looks that a single serializing instruction like `cpuid` might be OK for the entire flush to happen, this is what our `OrderAccess::cross_modify_fence` does. Still, we can maintain the old behavior by flushing the caches smarter: there are CLFLUSHOPT and CLWB available on modern x86.
>> 
>> See more discussion and references in the RFE. The performance data is in the comments in this PR.
>> 
>> Additional testing:
>>  - [x] Linux x86_64 server fastdebug, `all`
>>  - [x] Linux x86_64 server fastdebug, `all` + `X86ICacheSync={0,1,2,3,4}`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
> 
>  - Simplify platform changes
>  - Move x86-specific stuff to icache_x86
>  - Comment touchups
>  - Merge branch 'master' into JDK-8353558-x86-better-icache-flush
>  - Add SERIALIZE as well
>  - Fix

Thank you! There we go.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24389#issuecomment-2826574386


More information about the hotspot-dev mailing list