RFR: 8310656: RISC-V: __builtin___clear_cache can fail silently.
Ludovic Henry
luhenry at openjdk.org
Wed Jun 28 08:09:04 UTC 2023
On Wed, 28 Jun 2023 07:13:11 GMT, Fei Yang <fyang at openjdk.org> wrote:
>> Hi, please consider.
>>
>> We recently had a bug where user were missing permissions to use this syscall.
>> Which caused crashing on, according to hs_err on things like "addi x11, x24, 0" with SIGILL.
>> If it fails it is even possible to execute valid but 'old' instruction which may not lead to a crash, instead the program misbehaves.
>>
>> To avoid this mess I suggest that we first test the syscall during vm init and we use it directly.
>> This way we can make sure it never fails.
>>
>> Tested failing syscall with qemu, tested t1 in qemu, t1 on jh7110 in-progress.
>
> Hi, I don't quite understand how this issue triggers. Could it happen on real hardware platforms like Unmatched or others?
> Since `__builtin___clear_cache` is a GCC built-in function which doesn't return an error code, my previous understanding is that it will never fail to do its job. And the GCC manual doesn't even mention the cases where this could fail. Also I see uses of this built-in funtions on other platforms like AArch64.
>
>
> void __builtin___clear_cache (void *begin, void *end) [Built-in Function]
> This function is used to flush the processor’s instruction cache for the region of memory
> between begin inclusive and end exclusive. Some targets require that the instruction
> cache be flushed, after modifying memory containing code, in order to obtain
> deterministic behavior.
> If the target does not require instruction cache flushes, __builtin___clear_cache
> has no effect. Otherwise either instructions are emitted in-line to clear the instruction
> cache or a call to the __clear_cache function in libgcc is made.
@RealFYang the issue arises when you run Java _inside_ Docker. In that case, Docker will block certain syscalls (like riscv_flush_icache before Docker v23) unless you disable this security feature. `__builtin___clear_cache` will then silently fail (the syscall will fail with EPERM), and no icache flush will actually happen.
That doesn't happen on QEMU because the `riscv_flush_icache` syscall never actually gets called (it's "interpreted" by qemu inside the docker container), but it reproduces very reliably on HiFive Unmatched. To reproduce, simply run `java -version` on an Unmatched board inside a Docker container on stock Ubuntu or Debian (these still ship Docker v20).
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14670#issuecomment-1610958430
More information about the hotspot-dev
mailing list