The usage of fence.i in openjdk

Mon Aug 8 06:32:56 UTC 2022

>Otherwise if fixup_callers_callsite would call icache_flush() somewhere inside, then safepoint_ifence wouldn’t be needed here

Yes, we called icache_flush in fixup_callers_callsite:
SharedRuntime::fixup_callers_callsite->NativeCall::set_destination_mt_safe->ICache::invalidate_range-> icache_flush.
And I started a PR to fix the usage of fence.i in user space of RISC-V port: https://github.com/openjdk/jdk/pull/9770.

ISBs used in Aarch64 port are following the AArch64 Reference Manual:

Ensuring the visibility of updates to instructions for a multiprocessor
The ARMv8 architecture requires a PE that performs an instruction cache maintenance operation to execute a DSB
instruction to ensure completion of the maintenance operation. This ensures that the cache maintenance operation
is complete on all PEs in the Inner Shareable shareability domain.
An ISB is not broadcast, and so does not affect other PEs. This means that any other PE must perform its own ISB
synchronization after it knows that the update is visible, if it is necessary to ensure its synchronization with the
update. The following example shows how this might be done:

AArch64
P1
STR X11, [X1]             ;X11 contains a new instruction to stored in program memory
DC CVAU, X1             ; clean to PoU makes visible to instruction cache
DSB ISH                      ; ensure completion of the clean on all processors
IC IVAU, X1                ; ensure instruction cache/branch predictor discard stale data
DSB ISH                      ; ensure completion of the ICache and branch predictor
; invalidation on all processors
STR W0, [X2]             ; set flag to signal completion
ISB                               ; synchronize context on this processor
BR R1                          ; branch to new code

P2-Px
WAIT ([X2] == 1) ; wait for flag signalling completion
ISB ; synchronize context on this processor
BR X1 ; branch to new code

From: Vladimir Kempik [mailto:vladimir.kempik at gmail.com]
Sent: Saturday, August 6, 2022 6:07 AM
To: wangyadong (E) <yadonn.wang at huawei.com>
Cc: Palmer Dabbelt <palmer at dabbelt.com>; riscv-port-dev at openjdk.org
Subject: Re: The usage of fence.i in openjdk

More on this subject
I can see the use of ifence() in the code is identical to the use of isb() in aarch64.
Checking the documentation for fence.i and isb, I don’t see them to be 1:1 identical

fence.i ( https://five-embeddev.com/riscv-isa-manual/latest/zifencei.html ):
FENCE.I instruction provides explicit synchronization between writes to instruction memory and instruction fetches on the same hart.

ISB ( https://developer.arm.com/documentation/den0024/a/Memory-Ordering/Barriers/ISB-in-more-detail ):
An ISB flushes the pipeline, and re-fetches the instructions from the cache or memory and ensures that the effects of any completed context-changing operation before the ISB are visible to any instruction after the ISB. It also ensures that any context-changing operations after the ISB instruction only take effect after the ISB has been executed and are not seen by instructions before the ISB.
And some info from the web:

To me it sound like isb ( in aarch64) does the job a bit different than fence.i ( in rv64)

So, I think here:

  __ la_patchable(t0, RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::fixup_callers_callsite)), offset);
  __ jalr(x1, t0, offset);

  // Explicit fence.i required because fixup_callers_callsite may change the code
  // stream.
  __ safepoint_ifence();

  __ pop_CPU_state();
  // restore sp
  __ leave();
  __ bind(L);

 we still have a small chance to start executing invalid ( old) code  from l1i if right after safepoint_ifence() our thread would be moved to another hart. Otherwise if fixup_callers_callsite would call icache_flush() somewhere inside, then safepoint_ifence wouldn’t be needed here

Regards, Vladimir

30 июля 2022 г., в 13:29, Vladimir Kempik <vladimir.kempik at gmail.com<mailto:vladimir.kempik at gmail.com>> написал(а):

Hello
Thanks for explanation.
that sounds like the fence.i in userspace code is not needed at all
Regards, Vladimir

30 июля 2022 г., в 05:41, wangyadong (E) <yadonn.wang at huawei.com<mailto:yadonn.wang at huawei.com>> написал(а):

Lets say you have a thread A running on hart 1.
You've changed some code in region 0x11223300 and need fence.i before executing that code.
you execute fence.i in your thread A running on hart 1.
right after that your thread ( for some reason) got rescheduled ( by kernel) to hart 2.
if hart 2 had something in l1i corresponding to region 0x11223300, then you gonna have a problem: l1i on hart 2 has old code, it wasn’t refreshed, because fence.i was executed on hart 1 ( and never on hart 2). And you thread gonna execute old code, or mix of old and new code.

@vladimir Thanks for your explanation. I understand your concern now. We know the fence.i's scope, so the write hart does not rely solely on the fence.i in RISC-V port, but calls the icache_flush syscall in ICache::invalidate_range() every time after modifying the code.

For example:
Hart 1
void MacroAssembler::emit_static_call_stub() {
// CompiledDirectStaticCall::set_to_interpreted knows the
// exact layout of this stub.

ifence();
mov_metadata(xmethod, (Metadata*)NULL); <- patchable code here

// Jump to the entry point of the i2c stub.
int32_t offset = 0;
movptr_with_offset(t0, 0, offset);
jalr(x0, t0, offset);
}

Hart 2 (write hart)
void NativeMovConstReg::set_data(intptr_t x) {
// ...
  // Store x into the instruction stream.
  MacroAssembler::pd_patch_instruction_size(instruction_address(), (address)x); <- write code
  ICache::invalidate_range(instruction_address(), movptr_instruction_size);  <- syscall here
// ...
}

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/riscv-port-dev/attachments/20220808/41357e21/attachment-0001.htm>