The usage of fence.i in openjdk
Владимир Кемпик
vladimir.kempik at gmail.com
Fri Jul 29 10:30:44 UTC 2022
Hello
I was looking at the generated executable code sync across all harts in openjdk and found few thing to not be in line with the spec.
Looking at the spec: https://github.com/riscv/riscv-isa-manual/blob/master/src/zifencei.tex , there are few important moments:
>Because FENCE.I only orders stores with a hart's own instruction
>fetches, application code should only rely upon FENCE.I if the
>application thread will not be migrated to a different hart. The EEI
>can provide mechanisms for efficient multiprocessor instruction-stream
>synchronization.
I believe Java’s threads can migrate to different hart at any moment, hence the use of fence.i is dangerous.
There are few places where fence.i ( via fence_i() ) are used in openjdk at the moment:
void Assembler::ifence() {
fence_i();
if (UseConservativeFence) {
fence(ir, ir);
}
}
void MacroAssembler::safepoint_ifence() {
ifence();
....
}
void MacroAssembler::emit_static_call_stub() {
// CompiledDirectStaticCall::set_to_interpreted knows the
// exact layout of this stub.
ifence();
mov_metadata(xmethod, (Metadata*)NULL);
// Jump to the entry point of the i2c stub.
int32_t offset = 0;
movptr_with_offset(t0, 0, offset);
jalr(x0, t0, offset);
}
Maybe it would be good to get rid of them.
Another interesting point is:
>FENCE.I does not ensure that other RISC-V harts’
>instruction fetches will observe the local hart's stores in a
>multiprocessor system. To make a store to instruction memory visible
>to all RISC-V harts, the writing hart also has to execute a data FENCE
>before requesting that all remote RISC-V harts execute a FENCE.I.
Here is how we do the flush_icache call:
static void icache_flush(long int start, long int end)
{
const int SYSCALL_RISCV_FLUSH_ICACHE = 259;
register long int __a7 asm ("a7") = SYSCALL_RISCV_FLUSH_ICACHE;
register long int __a0 asm ("a0") = start;
register long int __a1 asm ("a1") = end;
// the flush can be applied to either all threads or only the current.
// 0 means a global icache flush, and the icache flush will be applied
// to other harts concurrently executing.
register long int __a2 asm ("a2") = 0;
__asm__ volatile ("ecall\n\t"
: "+r" (__a0)
: "r" (__a0), "r" (__a1), "r" (__a2), "r" (__a7)
: "memory");
}
Maybe there is a need to add __asm__ volatile ("fence":::"memory") at the beginning of this method.
Lets discuss these points.
Regards, Vladimir.
More information about the riscv-port-dev
mailing list