RFR: 8339771: RISC-V: Reduce icache flushes [v2]
Robbin Ehn
rehn at openjdk.org
Wed Sep 18 08:58:10 UTC 2024
On Tue, 10 Sep 2024 12:53:18 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
>> Hey, please consider,
>>
>> All code which is offline (behind a barrier) do not need global icache flushes.
>> As we can instead in slow path locally (thread and hart) emit fence.i.
>> But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions.
>> To handle this case new now have kernel support:
>> https://docs.kernel.org/arch/riscv/cmodx.html
>>
>> It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap.
>> But this is in many cases much faster as the icache flush global IPI is very intrusive.
>> Particular cases are running a concurrent gc with small head room.
>> In such scenario I measured 15% increased throughput on VF2.
>> A large CPU or less head room (faster GC cycles) will yield even more performance boost.
>>
>> Note that this requires 6.10 kernel.
>>
>> I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on)
>>
>> Later we probably want this default on, but as it's hard to test I'll leave default off.
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>
> Comment, moved init after feature enabling
> Thanks for sharing the performance numbers. It's a pity that I can't try this on my big RV machine for now which runs an older customized 6.1 kernel. BTW: Should we add similar checking like `verify_cross_modify_fence_not_required()` which was called in `MacroAssembler::read_polling_page()` & `MacroAssembler::build_frame()` and removed by #9770? (-XX:+VerifyCrossModifyFence)
As we have two cases:
Code stream changed during a safepoint you must emit before leaving the safepoint, which means e.g. a thread in native that returns to Java must emit it. Polls are only disarmed by threads them self and they always do a cmodx_fence after a disarm. (when they update the poll word)
This case is already covered by VerifyCrossModifyFence.
The second case, writing in the code stream in methods with nmethod barrier locked, would be essentially be a verfication of the barrier and the _patching_epoch.
If this needs verification, we need that regardless of this patch since do need both of them working, otherwise we may enter a nmethod with bad oop and we do need the loadload fence before entering, otherwise we may load stale data.
So I don't believe this patch require any additional verification, but we may need verification.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20913#issuecomment-2357881642
More information about the hotspot-dev
mailing list