答复: The usage of fence.i in openjdk

Palmer Dabbelt palmer at dabbelt.com
Fri Jul 29 18:02:59 UTC 2022


On Fri, 29 Jul 2022 08:12:21 PDT (-0700), yadonn.wang at huawei.com wrote:
> Hi, Vladimir,
> 
>> I believe Java’s threads can migrate to different hart at any moment, hence the use of fence.i is dangerous.
> Could you describe in detail why the use of fence.i is dangerous? I think it may be just inefficient but not dangerous.

The issue here is that fence.i applies to the current hart, whereas 
Linux userspace processes just know the current thread.  Executing a 
fence.i in userspace adds a bunch of orderings to the thread's state 
that can only be moved to a new hart via another fence.i.  Normally that 
sort of thing isn't such a big deal (there's similar state for things 
like fence and lr), but the SiFive chips implement fence.i by flushing 
the instruction cache which is a slow operation to put on the scheduling 
path.

The only way for the kernel to avoid that fence.i on the scheduling path 
is for it to know if one's been executed in userspace.  There's no way 
to trap on fence.i so instead the Linux uABI just requires userspace to 
make a syscall (or a VDSO library call).  If userspace directly executes 
a fence.i then the kernel won't know and thus can't ensure the thread 
state is adequately moved to the new hart during scheduling, which may 
result in incorrect behavior.

We've known for a while that this will cause performance issues for JITs 
on some implemenations, but so far it's just not been a priority.  I 
poked around the RISC-V OpenJDK port a few months ago and I think 
there's some improvements that can be made there, but we're probably 
also going to want some kernel support.  Exactly how to fix it is 
probably going to depend on the workloads and implementations, though, 
and while I think I understand the OpenJDK part pretty well it's not 
clear what the other fence.i implementations are doing.

In the long run we're also going to need some ISA support for doing this 
sanely, but that's sort of a different problem.  I've been kind of 
crossing my fingers and hoping that anyone who has a system where JIT 
performance is important is also going to have some better write/fetch 
ordering instructions, but given how long it's been maybe that's a bad 
plan.

That said, the direct fence.i is incorrect and it's likely that the 
long-term solution involves making the VDSO call so it's probably best 
to swap over.  I remember having written a patch to do that at some 
point, but I can't find it so maybe I just forgot to send it?

> To a certain extent, this code is hangover from the aarch64 port and we use fence.i to mimic isb.
>> Maybe there is a need to add __asm__ volatile ("fence":::"memory") at the beginning of this method.
> You're right. It'd better place a full data fence before the syscall, because we cannot guarantee here the syscall leave a data fence there before IPI remote fence.i to other harts

That's not necessary with the current implementation, but it's not 
specified by the interface.  I think we should probably just have 
strongly ordered memory as implicit in all user/kernel transitions, but 
I'm not sure if there's an issue there.

> 
> Yadong
> 
> -----邮件原件-----
> 发件人: riscv-port-dev <riscv-port-dev-retn at openjdk.org> 代表 Владимир Кемпик
> 发送时间: 2022年7月29日 18:31
> 收件人: riscv-port-dev at openjdk.org
> 主题: The usage of fence.i in openjdk
> 
> Hello
> I was looking at the generated executable code sync across all harts in openjdk and found few thing to not be in line with the spec.
> 
> Looking at the spec: https://github.com/riscv/riscv-isa-manual/blob/master/src/zifencei.tex , there are few important moments:
> 
>>Because FENCE.I only orders stores with a hart's own instruction 
>>fetches, application code should only rely upon FENCE.I if the 
>>application thread will not be migrated to a different hart.  The EEI 
>>can provide mechanisms for efficient multiprocessor instruction-stream 
>>synchronization.
> 
> I believe Java’s threads can migrate to different hart at any moment, hence the use of fence.i is dangerous.
> There are few places where fence.i ( via fence_i() ) are used in openjdk at the moment:
> 
> void Assembler::ifence() {
>   fence_i();
>   if (UseConservativeFence) {
>     fence(ir, ir);
>   }
> }
> 
> void MacroAssembler::safepoint_ifence() {
>   ifence();
> ....
> }
> 
> void MacroAssembler::emit_static_call_stub() {
>   // CompiledDirectStaticCall::set_to_interpreted knows the
>   // exact layout of this stub.
> 
>   ifence();C
>   mov_metadata(xmethod, (Metadata*)NULL);
> 
>   // Jump to the entry point of the i2c stub.
>   int32_t offset = 0;
>   movptr_with_offset(t0, 0, offset);
>   jalr(x0, t0, offset);
> }
> 
> Maybe it would be good to get rid of them.
> 
> Another interesting point is:
>>FENCE.I does not ensure that other RISC-V harts’
>>instruction fetches will observe the local hart's stores in a 
>>multiprocessor system. To make a store to instruction memory visible to 
>>all RISC-V harts, the writing hart also has to execute a data FENCE 
>>before requesting that all remote RISC-V harts execute a FENCE.I.
> 
> Here is how we do the flush_icache call:
> 
>  static void icache_flush(long int start, long int end)
>   {
>     const int SYSCALL_RISCV_FLUSH_ICACHE = 259;
>     register long int __a7 asm ("a7") = SYSCALL_RISCV_FLUSH_ICACHE;
>     register long int __a0 asm ("a0") = start;
>     register long int __a1 asm ("a1") = end;
>     // the flush can be applied to either all threads or only the current.
>     // 0 means a global icache flush, and the icache flush will be applied
>     // to other harts concurrently executing.
>     register long int __a2 asm ("a2") = 0;
>     __asm__ volatile ("ecall\n\t"
>                       : "+r" (__a0)
>                       : "r" (__a0), "r" (__a1), "r" (__a2), "r" (__a7)
>                       : "memory");
>   }
> 
> Maybe there is a need to add __asm__ volatile ("fence":::"memory") at the beginning of this method.
> 
> Lets discuss these points.
> 
> Regards, Vladimir.
> 


More information about the riscv-port-dev mailing list