答复: The usage of fence.i in openjdk

Fri Jul 29 15:12:21 UTC 2022

Hi, Vladimir,

> I believe Java’s threads can migrate to different hart at any moment, hence the use of fence.i is dangerous.
Could you describe in detail why the use of fence.i is dangerous? I think it may be just inefficient but not dangerous.
To a certain extent, this code is hangover from the aarch64 port and we use fence.i to mimic isb.

> Maybe there is a need to add __asm__ volatile ("fence":::"memory") at the beginning of this method.
You're right. It'd better place a full data fence before the syscall, because we cannot guarantee here the syscall leave a data fence there before IPI remote fence.i to other harts

Yadong

-----邮件原件-----
发件人: riscv-port-dev <riscv-port-dev-retn at openjdk.org> 代表 Владимир Кемпик
发送时间: 2022年7月29日 18:31
收件人: riscv-port-dev at openjdk.org
主题: The usage of fence.i in openjdk

Hello
I was looking at the generated executable code sync across all harts in openjdk and found few thing to not be in line with the spec.

Looking at the spec: https://github.com/riscv/riscv-isa-manual/blob/master/src/zifencei.tex , there are few important moments:

>Because FENCE.I only orders stores with a hart's own instruction 
>fetches, application code should only rely upon FENCE.I if the 
>application thread will not be migrated to a different hart.  The EEI 
>can provide mechanisms for efficient multiprocessor instruction-stream 
>synchronization.

I believe Java’s threads can migrate to different hart at any moment, hence the use of fence.i is dangerous.
There are few places where fence.i ( via fence_i() ) are used in openjdk at the moment:

void Assembler::ifence() {
  fence_i();
  if (UseConservativeFence) {
    fence(ir, ir);
  }
}

void MacroAssembler::safepoint_ifence() {
  ifence();
....
}

void MacroAssembler::emit_static_call_stub() {
  // CompiledDirectStaticCall::set_to_interpreted knows the
  // exact layout of this stub.

  ifence();C
  mov_metadata(xmethod, (Metadata*)NULL);

  // Jump to the entry point of the i2c stub.
  int32_t offset = 0;
  movptr_with_offset(t0, 0, offset);
  jalr(x0, t0, offset);
}

Maybe it would be good to get rid of them.

Another interesting point is:
>FENCE.I does not ensure that other RISC-V harts’
>instruction fetches will observe the local hart's stores in a 
>multiprocessor system. To make a store to instruction memory visible to 
>all RISC-V harts, the writing hart also has to execute a data FENCE 
>before requesting that all remote RISC-V harts execute a FENCE.I.

Here is how we do the flush_icache call:

 static void icache_flush(long int start, long int end)
  {
    const int SYSCALL_RISCV_FLUSH_ICACHE = 259;
    register long int __a7 asm ("a7") = SYSCALL_RISCV_FLUSH_ICACHE;
    register long int __a0 asm ("a0") = start;
    register long int __a1 asm ("a1") = end;
    // the flush can be applied to either all threads or only the current.
    // 0 means a global icache flush, and the icache flush will be applied
    // to other harts concurrently executing.
    register long int __a2 asm ("a2") = 0;
    __asm__ volatile ("ecall\n\t"
                      : "+r" (__a0)
                      : "r" (__a0), "r" (__a1), "r" (__a2), "r" (__a7)
                      : "memory");
  }

Maybe there is a need to add __asm__ volatile ("fence":::"memory") at the beginning of this method.

Lets discuss these points.

Regards, Vladimir.