RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v8]
Thomas Stuefe
stuefe at openjdk.java.net
Thu Oct 14 04:53:56 UTC 2021
On Thu, 14 Oct 2021 04:22:33 GMT, David Holmes <dholmes at openjdk.org> wrote:
>> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Change to VM unconditionally as long as current thread is JavaThread.
>>
>> Hoist VMErrorForceNative out of While.
>
> Hi Xin,
>
> I must apologise for misleading you with the use of _thread_in_native, I had forgotten that safepoint-safety also relied on the stack state.
>
> David
> hi, @dholmes-ora , @tstuefe ,
>
> Thank you for helping me in this PR. Let me drop this one.
>
> This is 10x more harder than I initially thought. `VMError::report_and_die()` could be called from anywhere, anytime. I thought _thread must be in VM state if it is a Java Thread. I was wrong.
>
> Like David pointed out, this patch can only solve "the second possibilities". Without it, we can claim that "Deadlock when jcmd of OnError attaches to itself" is a feature instead of a bug. But we shouldn't leave undefined behavior to HotSpot for "the 1st set of possibilities". Further, transitioning current Java Thread to Native can't guarantee it's safepoint safe either. It's still up to the last frame.
>
> ```
> static bool safepoint_safe_with(JavaThread *thread, JavaThreadState state) {
> switch(state) {
> case _thread_in_native:
> // native threads are safe if they have no java stack or have walkable stack
> return !thread->has_last_Java_frame() || thread->frame_anchor()->walkable();
> ...
> ```
>
> I don't like nondeterministic behavior, which leaves more questions than answers. We can revisit it later if we have a better solution.
Hi Xin,
sorry for your frustration. The complexity of this was not clear to me as well.
Thinking about this, self-attaching jcmd for analysis on error is maybe not the best solution to your problem. If the VM is crashy, all bets are off to what happens and we may hang and/or spoil the core. For hanging, we have the crash timeout watcher which will eventually kick the VM, but still, this is not perfect (takes too long, and you lose the core). And if the VM is in a java OOM, it is in a deterministic state but may have a high memory footprint which may prevent it from forking successfully. `os::fork_and_exec()` uses fork(), not vfork() or posix_spawn(), on some platforms. See the https://github.com/openjdk/jdk/pull/5698 for a discussion (stalled, waiting on input).
An alternative I would like would be to add a way to execute jcmds from within the VM itself on OOM, without the need to spawn a child process, without involving the attach framework. We'd still have to dive into native, but since we would restrict this to deterministic OOM scenarios this should work. For example an option like `-XX:CommandOnOOM=...`, to complement the existing `OnOOM` switches. These have the problem that they don't fire - for backward compatibility reasons - for all cases of OOM. But a new switch is free to handle this differently.
Cheers, Thomas
-------------
PR: https://git.openjdk.java.net/jdk/pull/5590
More information about the hotspot-dev
mailing list