RFR: 8337753: Target class of upcall stub may be unloaded

Tue Sep 3 13:49:57 UTC 2024

On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

> As discussed in the JBS issue:
> 
> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class<?> (vmholder)`
> 
> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash.
> 
> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance.
> 
> <details>
> <summary>Performance numbers</summary>
> x64:
> 
> before:
> 
> Benchmark             Mode  Cnt   Score   Error  Units
> Upcalls.panama_blank  avgt   30  69.216 ± 1.791  ns/op
> 
> 
> after:
> 
> Benchmark             Mode  Cnt   Score   Error  Units
> Upcalls.panama_blank  avgt   30  67.787 ± 0.684  ns/op
> 
> 
> aarch64:
> 
> before:
> 
> Benchmark             Mode  Cnt   Score   Error  Units
> Upcalls.panama_blank  avgt   30  61.574 ± 0.801  ns/op
> 
> 
> after:
> 
> Benchmark             Mode  Cnt   Score   Error  Units
> Upcalls.panama_blank  avgt   30  61.218 ± 0.554  ns/op
> 
> </details>
> 
> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4.
> 
> Testing: tier 1-4

Can't we do these nasty loads in C++ code and use `set_vm_result_2` in `UpcallLinker::on_entry`?
The GC barriers can generate excessive amounts of code with some GCs. I guess that upcalls are less performance critical, so I'd prefer the other solution. Maybe the C++ code can get optimized better, too.

Some of the `DecoratorSet` should be applicable and improve performance. If that doesn't help enough, maybe we should implement a dedicated static stub? There's no need to have the code replicated in each upcall stub.

Also note that each `load_heap_oop` may save and restore registers which is actually only needed once.

Regarding PPC64, I think that we could avoid PRESERVATION_FRAME_LR_GP_FP_REGS if we rearrange it such that the `load_heap_oop` is done at a place where the volatile regs are not live. But seems like this optimization is not available for other platforms.

Some performance related remarks:

- You could use `resolve_global_jobject` which is shorter and faster and exists on most platforms.
- Using `vm_result_2` is no longer needed. The Method* can be directly passed in the method reg (or loaded from `callee_target`).
- If you call the stub from assembler instead of from C++ you should be able to save some extra stuff like the frame.

I'll check the PPC64 code later.

@offamitkumar: Can you take a look at the s390 code, please? The cross build has failed. For the future: You may want to implement `resolve_global_jobject` which is shorter and faster and available on the other platforms.

src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 4778:

> 4776:     StubCodeMark mark(this, "StubRoutines", "upcall_stub_load_target");
> 4777:     address start = __ pc();
> 4778:     __ save_LR_CR(R0);

I think `save_LR_CR` and `restore_LR_CR` should get removed, too. CR is not live and LR is preserved everywhere below. But, I'll check this later.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2275524582
PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2275707529
PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2278240985
PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2325103457
PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1713844568