The usage of fence.i in openjdk
wangyadong (E)
yadonn.wang at huawei.com
Sat Jul 30 02:41:36 UTC 2022
> Lets say you have a thread A running on hart 1.
> You've changed some code in region 0x11223300 and need fence.i before executing that code.
> you execute fence.i in your thread A running on hart 1.
> right after that your thread ( for some reason) got rescheduled ( by kernel) to hart 2.
> if hart 2 had something in l1i corresponding to region 0x11223300, then you gonna have a problem: l1i on hart 2 has old code, it wasn’t refreshed, because fence.i was executed on hart 1 ( and never on hart 2). And you thread gonna execute old code, or mix of old and new code.
@vladimir Thanks for your explanation. I understand your concern now. We know the fence.i's scope, so the write hart does not rely solely on the fence.i in RISC-V port, but calls the icache_flush syscall in ICache::invalidate_range() every time after modifying the code.
For example:
Hart 1
void MacroAssembler::emit_static_call_stub() {
// CompiledDirectStaticCall::set_to_interpreted knows the
// exact layout of this stub.
ifence();
mov_metadata(xmethod, (Metadata*)NULL); <- patchable code here
// Jump to the entry point of the i2c stub.
int32_t offset = 0;
movptr_with_offset(t0, 0, offset);
jalr(x0, t0, offset);
}
Hart 2 (write hart)
void NativeMovConstReg::set_data(intptr_t x) {
// ...
// Store x into the instruction stream.
MacroAssembler::pd_patch_instruction_size(instruction_address(), (address)x); <- write code
ICache::invalidate_range(instruction_address(), movptr_instruction_size); <- syscall here
// ...
}
The syscall here:
void flush_icache_mm(struct mm_struct *mm, bool local)
{
unsigned int cpu;
cpumask_t others, *mask;
preempt_disable();
/* Mark every hart's icache as needing a flush for this MM. */
mask = &mm->context.icache_stale_mask;
cpumask_setall(mask);
/* Flush this hart's I$ now, and mark it as flushed. */
cpu = smp_processor_id();
cpumask_clear_cpu(cpu, mask);
local_flush_icache_all();
/*
* Flush the I$ of other harts concurrently executing, and mark them as
* flushed.
*/
cpumask_andnot(&others, mm_cpumask(mm), cpumask_of(cpu));
local |= cpumask_empty(&others);
if (mm == current->active_mm && local) {
/*
* It's assumed that at least one strongly ordered operation is
* performed on this hart between setting a hart's cpumask bit
* and scheduling this MM context on that hart. Sending an SBI
* remote message will do this, but in the case where no
* messages are sent we still need to order this hart's writes
* with flush_icache_deferred().
*/
smp_mb();
} else if (IS_ENABLED(CONFIG_RISCV_SBI)) {
sbi_remote_fence_i(&others);
} else {
on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
}
preempt_enable();
}
> Maybe it would be good to get rid of them.
So maybe we can remove the fence.i used in the user space from a performance perspective, but not because it's not safe.
Please correct me if I'm wrong.
Yadong
-----Original Message-----
From: Palmer Dabbelt [mailto:palmer at dabbelt.com]
Sent: Saturday, July 30, 2022 2:08 AM
To: vladimir.kempik at gmail.com
Cc: wangyadong (E) <yadonn.wang at huawei.com>; riscv-port-dev at openjdk.org
Subject: Re: The usage of fence.i in openjdk
On Fri, 29 Jul 2022 10:41:18 PDT (-0700), vladimir.kempik at gmail.com wrote:
>
>
>> 29 июля 2022 г., в 18:12, wangyadong (E) <yadonn.wang at huawei.com <mailto:yadonn.wang at huawei.com>> написал(а):
>>
>> Hi, Vladimir,
>>
>>> I believe Java’s threads can migrate to different hart at any moment, hence the use of fence.i is dangerous.
>> Could you describe in detail why the use of fence.i is dangerous? I think it may be just inefficient but not dangerous.
>> To a certain extent, this code is hangover from the aarch64 port and we use fence.i to mimic isb.
>>
> Basically, fence.i executed on a hart, it is only doing anything to
> the current hart. And your thread can be rescheduled to another hart
> soon after fence.i
>
> Lets say you have a thread A running on hart 1.
> You've changed some code in region 0x11223300 and need fence.i before executing that code.
> you execute fence.i in your thread A running on hart 1.
> right after that your thread ( for some reason) got rescheduled ( by kernel) to hart 2.
> if hart 2 had something in l1i corresponding to region 0x11223300, then you gonna have a problem: l1i on hart 2 has old code, it wasn’t refreshed, because fence.i was executed on hart 1 ( and never on hart 2). And you thread gonna execute old code, or mix of old and new code.
Sorry for forking the thread, I saw this come in right after I sent my message. This is correct, the performance-related reasons we're not doing a fence.i when scheduling are discribed in that message:
<https://mail.openjdk.org/pipermail/riscv-port-dev/2022-July/000570.html>
>
> Regards, Vladimir.
>
>>> Maybe there is a need to add __asm__ volatile ("fence":::"memory") at the beginning of this method.
>> You're right. It'd better place a full data fence before the syscall, because we cannot guarantee here the syscall leave a data fence there before IPI remote fence.i to other harts
>>
>> Yadong
>>
More information about the riscv-port-dev
mailing list