RFR: 8337753: Target class of upcall stub may be unloaded [v6]
Amit Kumar
amitkumar at openjdk.org
Fri Sep 20 04:09:40 UTC 2024
On Thu, 19 Sep 2024 12:20:13 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:
>> As discussed in the JBS issue:
>>
>> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class<?> (vmholder)`
>>
>> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash.
>>
>> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance.
>>
>> <details>
>> <summary>Performance numbers</summary>
>> x64:
>>
>> before:
>>
>> Benchmark Mode Cnt Score Error Units
>> Upcalls.panama_blank avgt 30 69.216 ± 1.791 ns/op
>>
>>
>> after:
>>
>> Benchmark Mode Cnt Score Error Units
>> Upcalls.panama_blank avgt 30 67.787 ± 0.684 ns/op
>>
>>
>> aarch64:
>>
>> before:
>>
>> Benchmark Mode Cnt Score Error Units
>> Upcalls.panama_blank avgt 30 61.574 ± 0.801 ns/op
>>
>>
>> after:
>>
>> Benchmark Mode Cnt Score Error Units
>> Upcalls.panama_blank avgt 30 61.218 ± 0.554 ns/op
>>
>> </details>
>>
>> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4.
>>
>> Testing: tier 1-4
>
> Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits:
>
> - use resolve_global_jobject on s390
> - Merge branch 'master' into LoadVMTraget
> - remove PC save/restore on s390
> - use fatal()
> - add RISC-V as target platform
> - Adjust ppc & RISC-V code
> - Add s390 changes
> - Merge branch 'master' into LoadVMTraget
> - Don't save/restore LR/CR + resolve_jobject on s390
> - eyeball other platforms
> - ... and 14 more: https://git.openjdk.org/jdk/compare/2faf8b8d...b703b162
I ran tier1-tests on release & fastdebug VMs for s390x. Didn't see any new failure appearing. s390x part seems good to me.
Marked as reviewed by amitkumar (Committer).
-------------
PR Review: https://git.openjdk.org/jdk/pull/20479#pullrequestreview-2317183506
PR Review: https://git.openjdk.org/jdk/pull/20479#pullrequestreview-2317183803
More information about the hotspot-compiler-dev
mailing list