Re: Re: 答复: The usage of fence.i in openjdk

Sat Jul 30 13:35:11 UTC 2022

Hi Palmer,

> -----Original Messages-----
> From: "Palmer Dabbelt" <palmer at dabbelt.com>
> Sent Time: 2022-07-30 02:02:59 (Saturday)
> To: yadonn.wang at huawei.com
> Cc: vladimir.kempik at gmail.com, riscv-port-dev at openjdk.org
> Subject: Re: 答复: The usage of fence.i in openjdk
> 
> On Fri, 29 Jul 2022 08:12:21 PDT (-0700), yadonn.wang at huawei.com wrote:
> > Hi, Vladimir,
> > 
> >> I believe Java’s threads can migrate to different hart at any moment, hence the use of fence.i is dangerous.
> > Could you describe in detail why the use of fence.i is dangerous? I think it may be just inefficient but not dangerous.
> 
> The issue here is that fence.i applies to the current hart, whereas 
> Linux userspace processes just know the current thread.  Executing a 
> fence.i in userspace adds a bunch of orderings to the thread's state 
> that can only be moved to a new hart via another fence.i.  Normally that 
> sort of thing isn't such a big deal (there's similar state for things 
> like fence and lr), but the SiFive chips implement fence.i by flushing 
> the instruction cache which is a slow operation to put on the scheduling 
> path.
> 
> The only way for the kernel to avoid that fence.i on the scheduling path 
> is for it to know if one's been executed in userspace.  There's no way 
> to trap on fence.i so instead the Linux uABI just requires userspace to 
> make a syscall (or a VDSO library call).  If userspace directly executes 
> a fence.i then the kernel won't know and thus can't ensure the thread 
> state is adequately moved to the new hart during scheduling, which may 
> result in incorrect behavior.
> 
> We've known for a while that this will cause performance issues for JITs 
> on some implemenations, but so far it's just not been a priority.  I 
> poked around the RISC-V OpenJDK port a few months ago and I think 
> there's some improvements that can be made there, but we're probably 
> also going to want some kernel support.  Exactly how to fix it is 
> probably going to depend on the workloads and implementations, though, 
> and while I think I understand the OpenJDK part pretty well it's not 
> clear what the other fence.i implementations are doing.
> 
> In the long run we're also going to need some ISA support for doing this 
> sanely, but that's sort of a different problem.  I've been kind of 
> crossing my fingers and hoping that anyone who has a system where JIT 
> performance is important is also going to have some better write/fetch 
> ordering instructions, but given how long it's been maybe that's a bad 
> plan.
> 
> That said, the direct fence.i is incorrect and it's likely that the 
> long-term solution involves making the VDSO call so it's probably best 
> to swap over.  I remember having written a patch to do that at some 
> point, but I can't find it so maybe I just forgot to send it?

Thanks for all those considerations about the design. It's very helpfull.

> > To a certain extent, this code is hangover from the aarch64 port and we use fence.i to mimic isb.
> >> Maybe there is a need to add __asm__ volatile ("fence":::"memory") at the beginning of this method.
> > You're right. It'd better place a full data fence before the syscall, because we cannot guarantee here the syscall leave a data fence there before IPI remote fence.i to other harts
> 
> That's not necessary with the current implementation, but it's not 

But looks like this is not reflected in kernel function flush_icache_mm?
I checked the code and it looks to me that data fence is issued for only one path:

 59         if (mm == current->active_mm && local) {
 60                 /*
 61                  * It's assumed that at least one strongly ordered operation is
 62                  * performed on this hart between setting a hart's cpumask bit
 63                  * and scheduling this MM context on that hart.  Sending an SBI
 64                  * remote message will do this, but in the case where no
 65                  * messages are sent we still need to order this hart's writes
 66                  * with flush_icache_deferred().
 67                  */
 68                 smp_mb();
 69         }

I just want to make sure that the data fence is there in this syscall with the current implementation.
But I am not familar with the riscv linux kernel code and it's appreciated if you have more details.

Thanks,
Fei</palmer at dabbelt.com>