<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">More on this subject<br class="">I can see the use of ifence() in the code is identical to the use of isb() in aarch64.<br class="">Checking the documentation for fence.i and isb, I don’t see them to be 1:1 identical<br class=""><br class="">fence.i ( <a href="https://five-embeddev.com/riscv-isa-manual/latest/zifencei.html" class="">https://five-embeddev.com/riscv-isa-manual/latest/zifencei.html</a> ): <span class=""><br class="">FENCE.I instruction provides explicit synchronization between writes to instruction memory and instruction fetches on the same hart.<br class=""><br class="">ISB ( <a href="https://developer.arm.com/documentation/den0024/a/Memory-Ordering/Barriers/ISB-in-more-detail" class="">https://developer.arm.com/documentation/den0024/a/Memory-Ordering/Barriers/ISB-in-more-detail</a> ):<br class="">An ISB flushes the pipeline, and re-fetches the instructions from the cache or memory and ensures that the effects of any completed context-changing operation before the ISB are visible to any instruction after the ISB. It also ensures that any context-changing operations after the ISB instruction only take effect after the ISB has been executed and are not seen by instructions before the ISB. <br class="">And some info from the web:</span><span class=""><br class=""><br class="">To me it sound like isb ( in aarch64) does the job a bit different than fence.i ( in rv64)<br class=""><br class="">So, I think here:<br class=""><br class=""> __ la_patchable(t0, RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::fixup_callers_callsite)), offset);<br class=""> __ jalr(x1, t0, offset);<br class=""><br class=""> // Explicit fence.i required because fixup_callers_callsite may change the code<br class=""> // stream.<br class=""> __ safepoint_ifence();<br class=""><br class=""> __ pop_CPU_state();<br class=""> // restore sp<br class=""> __ leave();<br class=""> __ bind(L);<br class=""><br class=""> we still have a small chance to start executing invalid ( old) code from l1i if right after safepoint_ifence() our thread would be moved to another hart. Otherwise if fixup_callers_callsite would call icache_flush() somewhere inside, then safepoint_ifence wouldn’t be needed here</span><div class=""><span class=""><br class=""></span></div><div class=""><span class=""><br class=""></span></div><div class=""><span class="">Regards, Vladimir<br class=""><br class=""><blockquote type="cite" class="">30 июля 2022 г., в 13:29, Vladimir Kempik <<a href="mailto:vladimir.kempik@gmail.com" class="">vladimir.kempik@gmail.com</a>> написал(а):<br class=""><br class="">Hello<br class="">Thanks for explanation.<br class="">that sounds like the fence.i in userspace code is not needed at all<br class="">Regards, Vladimir<br class=""><blockquote type="cite" class="">30 июля 2022 г., в 05:41, wangyadong (E) <<a href="mailto:yadonn.wang@huawei.com" class="">yadonn.wang@huawei.com</a>> написал(а):<br class=""><br class=""><blockquote type="cite" class="">Lets say you have a thread A running on hart 1.<br class="">You've changed some code in region 0x11223300 and need fence.i before executing that code.<br class="">you execute fence.i in your thread A running on hart 1. <br class="">right after that your thread ( for some reason) got rescheduled ( by kernel) to hart 2.<br class="">if hart 2 had something in l1i corresponding to region 0x11223300, then you gonna have a problem: l1i on hart 2 has old code, it wasn’t refreshed, because fence.i was executed on hart 1 ( and never on hart 2). And you thread gonna execute old code, or mix of old and new code.<br class=""></blockquote><br class="">@vladimir Thanks for your explanation. I understand your concern now. We know the fence.i's scope, so the write hart does not rely solely on the fence.i in RISC-V port, but calls the icache_flush syscall in ICache::invalidate_range() every time after modifying the code.<br class=""><br class="">For example:<br class="">Hart 1<br class="">void MacroAssembler::emit_static_call_stub() {<br class="">// CompiledDirectStaticCall::set_to_interpreted knows the<br class="">// exact layout of this stub.<br class=""><br class="">ifence();<br class="">mov_metadata(xmethod, (Metadata*)NULL); <- patchable code here<br class=""><br class="">// Jump to the entry point of the i2c stub.<br class="">int32_t offset = 0;<br class="">movptr_with_offset(t0, 0, offset);<br class="">jalr(x0, t0, offset);<br class="">}<br class=""><br class="">Hart 2 (write hart)<br class="">void NativeMovConstReg::set_data(intptr_t x) {<br class="">// ...<br class=""> // Store x into the instruction stream.<br class=""> MacroAssembler::pd_patch_instruction_size(instruction_address(), (address)x); <- write code<br class=""> ICache::invalidate_range(instruction_address(), movptr_instruction_size); <- syscall here<br class="">// ...<br class="">} <br class=""><br class=""></blockquote><br class=""></blockquote><br class=""></span></div></body></html>