The usage of fence.i in openjdk

Владимир Кемпик vladimir.kempik at gmail.com
Fri Jul 29 10:30:44 UTC 2022


Hello
I was looking at the generated executable code sync across all harts in openjdk and found few thing to not be in line with the spec.

Looking at the spec: https://github.com/riscv/riscv-isa-manual/blob/master/src/zifencei.tex , there are few important moments:

>Because FENCE.I only orders stores with a hart's own instruction
>fetches, application code should only rely upon FENCE.I if the
>application thread will not be migrated to a different hart.  The EEI
>can provide mechanisms for efficient multiprocessor instruction-stream
>synchronization.

I believe Java’s threads can migrate to different hart at any moment, hence the use of fence.i is dangerous.
There are few places where fence.i ( via fence_i() ) are used in openjdk at the moment:

void Assembler::ifence() {
  fence_i();
  if (UseConservativeFence) {
    fence(ir, ir);
  }
}

void MacroAssembler::safepoint_ifence() {
  ifence();
....
}

void MacroAssembler::emit_static_call_stub() {
  // CompiledDirectStaticCall::set_to_interpreted knows the
  // exact layout of this stub.

  ifence();
  mov_metadata(xmethod, (Metadata*)NULL);

  // Jump to the entry point of the i2c stub.
  int32_t offset = 0;
  movptr_with_offset(t0, 0, offset);
  jalr(x0, t0, offset);
}

Maybe it would be good to get rid of them.

Another interesting point is:
>FENCE.I does not ensure that other RISC-V harts’
>instruction fetches will observe the local hart's stores in a
>multiprocessor system. To make a store to instruction memory visible
>to all RISC-V harts, the writing hart also has to execute a data FENCE
>before requesting that all remote RISC-V harts execute a FENCE.I.

Here is how we do the flush_icache call:

 static void icache_flush(long int start, long int end)
  {
    const int SYSCALL_RISCV_FLUSH_ICACHE = 259;
    register long int __a7 asm ("a7") = SYSCALL_RISCV_FLUSH_ICACHE;
    register long int __a0 asm ("a0") = start;
    register long int __a1 asm ("a1") = end;
    // the flush can be applied to either all threads or only the current.
    // 0 means a global icache flush, and the icache flush will be applied
    // to other harts concurrently executing.
    register long int __a2 asm ("a2") = 0;
    __asm__ volatile ("ecall\n\t"
                      : "+r" (__a0)
                      : "r" (__a0), "r" (__a1), "r" (__a2), "r" (__a7)
                      : "memory");
  }

Maybe there is a need to add __asm__ volatile ("fence":::"memory") at the beginning of this method.

Lets discuss these points.

Regards, Vladimir.



More information about the riscv-port-dev mailing list