RFR: 8353558: x86: Use CLFLUSHOPT/CLWB/CPUID for ICache sync [v2]
Quan Anh Mai
qamai at openjdk.org
Tue Apr 15 11:20:42 UTC 2025
On Tue, 15 Apr 2025 10:58:36 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> For Leyden, that wants to load a lot of code as fast as it can, code cache flush costs are now significant part of the picture. There are single-digit percent startup time opportunities in better ICache syncs.
>>
>> It is not sufficiently clear why icache flushes are needed for x86. Intel/AMD manuals say the instruction caches are fully coherent. GCC intrinsic for `__builtin___clear_cache` is empty. It looks that a single serializing instruction like `cpuid` might be OK for the entire flush to happen, this is what our `OrderAccess::cross_modify_fence` does. Still, we can maintain the old behavior by flushing the caches smarter: there are CLFLUSHOPT and CLWB available on modern x86.
>>
>> See more discussion and references in the RFE. The performance data is in the comments in this PR.
>>
>> Additional testing:
>> - [x] Linux x86_64 server fastdebug, `all`
>> - [x] Linux x86_64 server fastdebug, `all` + `X86ICacheSync={0,1,2,3,4}`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit:
>
> Fix
src/hotspot/cpu/x86/icache_x86.cpp line 42:
> 40: break;
> 41: case 4:
> 42: __ push(rax);
x86 also has `serialize` which does, you guess what, serialize the instruction stream. I suggest adding a routine in `MacroAssembler` which does this if `serialize` is not available.
https://www.felixcloutier.com/x86/serialize
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/24389#discussion_r2044289934
More information about the hotspot-dev
mailing list