RFR: 8339771: RISC-V: Reduce icache flushes [v2]
Robbin Ehn
rehn at openjdk.org
Tue Sep 17 07:15:06 UTC 2024
On Tue, 10 Sep 2024 12:53:18 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
>> Hey, please consider,
>>
>> All code which is offline (behind a barrier) do not need global icache flushes.
>> As we can instead in slow path locally (thread and hart) emit fence.i.
>> But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions.
>> To handle this case new now have kernel support:
>> https://docs.kernel.org/arch/riscv/cmodx.html
>>
>> It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap.
>> But this is in many cases much faster as the icache flush global IPI is very intrusive.
>> Particular cases are running a concurrent gc with small head room.
>> In such scenario I measured 15% increased throughput on VF2.
>> A large CPU or less head room (faster GC cycles) will yield even more performance boost.
>>
>> Note that this requires 6.10 kernel.
>>
>> I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on)
>>
>> Later we probably want this default on, but as it's hard to test I'll leave default off.
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>
> Comment, moved init after feature enabling
We have two major categories of cmodx.:
1. Exectioner is not aware of any changes.
1.1 Writer can icache_flush_all *or* executioner can *always* emit fence.i AND UseCtxFencei.
If the code is not changed often the inline fence.i is just costly and icache_flush_all is better.
1.2. We can enhance this by signaling a thread handshake which will force all threads to emit a fence.i.
Unclear if this is worth, because under some situations it can take a while to flush that out.
2. Exectioner is *aware* of any changes made:
2.1 After safepoint. No need to do *icache_flush_all* in a safepoint. Just emit fence.i when leaving + UseCtxFencei.
2.2 Nmethod entry barrier, same here.
The patch you are refering to was dealing with 1, which we shouldn't IMHO.
I'm dealing with 2, in a safepoint any code changes do not require *icache_flush_all* and all entries into nmethod is handle throught the barrier. Changes to nmethod do not need *icache_flush_all* of barrier used.
1: When we updates oops in code stream during safepoint or with nmethod barrier locked in Relocation::pd_set_data_value we do not need to flush.
2: When ZGC updates the color in code stream with nmethod barrier lock we do not need to flush.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20913#issuecomment-2354722731
More information about the hotspot-dev
mailing list