RFR: 8337753: Target class of upcall stub may be unloaded

Jorn Vernee jvernee at openjdk.org
Wed Sep 4 12:01:24 UTC 2024


On Wed, 4 Sep 2024 06:14:57 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> As discussed in the JBS issue:
>> 
>> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class<?> (vmholder)`
>> 
>> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash.
>> 
>> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance.
>> 
>> <details>
>> <summary>Performance numbers</summary>
>> x64:
>> 
>> before:
>> 
>> Benchmark             Mode  Cnt   Score   Error  Units
>> Upcalls.panama_blank  avgt   30  69.216 ± 1.791  ns/op
>> 
>> 
>> after:
>> 
>> Benchmark             Mode  Cnt   Score   Error  Units
>> Upcalls.panama_blank  avgt   30  67.787 ± 0.684  ns/op
>> 
>> 
>> aarch64:
>> 
>> before:
>> 
>> Benchmark             Mode  Cnt   Score   Error  Units
>> Upcalls.panama_blank  avgt   30  61.574 ± 0.801  ns/op
>> 
>> 
>> after:
>> 
>> Benchmark             Mode  Cnt   Score   Error  Units
>> Upcalls.panama_blank  avgt   30  61.218 ± 0.554  ns/op
>> 
>> </details>
>> 
>> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4.
>> 
>> Testing: tier 1-4
>
> src/hotspot/cpu/riscv/upcallLinker_riscv.cpp line 264:
> 
>> 262: 
>> 263:   __ block_comment("{ load target ");
>> 264:   __ movptr(j_rarg0, (intptr_t) receiver);
> 
> Hi @JornVernee , Could you please apply following small add-on change for linux-riscv64? As I witnessed build warning with GCC-13. Otherwise, builds fine and the newly-added test/jdk/java/foreign/TestUpcallStress.java is passing. (PS: jdk_foreign tests are passing too)
> 
> 
> diff --git a/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp b/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp
> index 5c45a679316..55160be99d0 100644
> --- a/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp
> +++ b/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp
> @@ -261,7 +261,7 @@ address UpcallLinker::make_upcall_stub(jobject receiver, Symbol* signature,
>    __ block_comment("} argument shuffle");
> 
>    __ block_comment("{ load target ");
> -  __ movptr(j_rarg0, (intptr_t) receiver);
> +  __ movptr(j_rarg0, (address) receiver);
>    __ far_call(RuntimeAddress(StubRoutines::upcall_stub_load_target())); // loads Method* into xmethod
>    __ block_comment("} load target ");
> 
> diff --git a/test/jdk/java/foreign/TestUpcallStress.java b/test/jdk/java/foreign/TestUpcallStress.java
> index 3b9b1d4b207..40607746856 100644
> --- a/test/jdk/java/foreign/TestUpcallStress.java
> +++ b/test/jdk/java/foreign/TestUpcallStress.java
> @@ -24,7 +24,7 @@
>  /*
>   * @test
>   * @requires jdk.foreign.linker != "FALLBACK"
> - * @requires os.arch == "aarch64" & os.name == "Linux"
> + * @requires (os.arch == "aarch64" | os.arch=="riscv64") & os.name == "Linux"
>   * @requires os.maxMemory > 4G
>   * @modules java.base/jdk.internal.foreign
>   * @build NativeTestHelper CallGeneratorHelper TestUpcallBase

Were you able to reproduce the original issue on RISC-V?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743643186


More information about the core-libs-dev mailing list