RFR: 8273608: Deadlock when jcmd of OnError attaches to itself

Xin Liu xliu at openjdk.java.net
Wed Sep 22 06:59:00 UTC 2021


On Tue, 21 Sep 2021 05:32:37 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> This patch allows the custom commands of OnError to attach to HotSpot itself. 
>> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). 
>> This prevents cmds which require safepoint synchronization from deadlock.
>> eg. OnError='jcmd %p Thread.print'.
>> 
>> Without this patch, we will encounter a deadlock at safepoint synchronization. 
>> `"main" #1`  is the very thread which executes `os::fork_and_exec(cmd)`.  
>> 
>> 
>> Aborting due to java.lang.OutOfMemoryError: Java heap space
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (debug.cpp:364), pid=94632, tid=94633
>> #  fatal error: OutOfMemory encountered: Java heap space
>> #
>> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log
>> #
>> # -XX:OnError="jcmd %p Thread.print"
>> #   Executing /bin/sh -c "jcmd 94632 Thread.print" ...
>> 94632:
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected:
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:
>> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable  [0x00007f01b7a08000]
>> [10.616s][warning][safepoint]    java.lang.Thread.State: RUNNABLE
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
>
> src/hotspot/share/utilities/vmError.cpp line 1334:
> 
>> 1332: 
>> 1333:     if (_thread != nullptr) {
>> 1334:       assert(!_thread->owns_locks(), "must release all locks when leaving VM");
> 
> This can't be an assertion as the thread is not knowingly leaving the VM without first releasing locks. If it does hold locks then that could lead to additional problems and strange errors when running the external command. We have to decide whether it is safest/best to simply not transition to native if holding locks, or whether it is okay to proceed knowing that there are risks of secondary crashes, or hangs, if we do.

In debug build, a JavaThread can't transit to Native if it owns any lock.  Even I remove the assert here, it will hit another assert later in `ThreadStateTransition::transition_from_vm`.


// Checks safepoint allowed and clears unhandled oops at potential safepoints.
void JavaThread::check_possible_safepoint() {
  if (_no_safepoint_count > 0) {
    print_owned_locks();
    assert(false, "Possible safepoint reached by thread that does not allow it");
  }


I'd like to make VMErrorThreadToNativeFromVM only change state if _thread doesn't own any mutex, but 'Thread::own_lock()` is only available in debug build. 

Only one test  ’runtime/ErrorHandling/TestOnError.java‘ will call VMError::report_and_die with Threads_lock.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5590


More information about the hotspot-dev mailing list