RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v2]

David Holmes dholmes at openjdk.java.net
Wed Sep 29 01:25:51 UTC 2021


On Mon, 27 Sep 2021 23:27:39 GMT, Xin Liu <xliu at openjdk.org> wrote:

>> This patch allows the custom commands of OnError to attach to HotSpot itself. 
>> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). 
>> This prevents cmds which require safepoint synchronization from deadlock.
>> eg. OnError='jcmd %p Thread.print'.
>> 
>> Without this patch, we will encounter a deadlock at safepoint synchronization. 
>> `"main" #1`  is the very thread which executes `os::fork_and_exec(cmd)`.  
>> 
>> 
>> Aborting due to java.lang.OutOfMemoryError: Java heap space
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (debug.cpp:364), pid=94632, tid=94633
>> #  fatal error: OutOfMemory encountered: Java heap space
>> #
>> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log
>> #
>> # -XX:OnError="jcmd %p Thread.print"
>> #   Executing /bin/sh -c "jcmd 94632 Thread.print" ...
>> 94632:
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected:
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:
>> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable  [0x00007f01b7a08000]
>> [10.616s][warning][safepoint]    java.lang.Thread.State: RUNNABLE
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
>
> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Add a new testcase for OutOfMemoryError thrown from NIO.
>  - Make state changer one way in VMError.
>    
>    Add a test to show that jcmd %p won't get stuck.
>  - Merge branch 'master' into JDK-8273608
>  - 8273608: Deadlock when jcmd of OnError attaches to itself
>    
>    Allow custom command of OnError to attach to HotSpot itself. This patch sets
>    the thread of report_and_die() to Native before os::fork_and_exec(cmd). This
>    prevents cmds which require safepoint synchronization from deadlock.
>    eg. OnError='jcmd %p Thread.print'.

Hi Xin,

I still have a few concerns about the details here. See below.

Thanks,
David

src/hotspot/share/runtime/mutexLocker.cpp line 379:

> 377:   assert(thread != NULL, "can't be owned by NULL");
> 378:   if (thread->is_Watcher_thread()) {
> 379:     // need WatcherThread as a safeguard against potential deadlocks

You only call this for JavaThreads so you can't see the WatcherThread.

src/hotspot/share/runtime/mutexLocker.cpp line 390:

> 388:     owned_lock = next;
> 389:   }
> 390: #endif // ASSERT

Should we also clear `_owned_locks` after this so that it is still correct?

src/hotspot/share/runtime/mutexLocker.cpp line 396:

> 394:        _mutex_array[i]->unlock();
> 395:      }
> 396:   }

Surely we don't need this in a debug build as we already unlocked all owned locks?

src/hotspot/share/runtime/mutexLocker.hpp line 175:

> 173: // by fatal error handler.
> 174: void print_owned_locks_on_error(outputStream* st);
> 175: void unlock_locks_owned_by(Thread* t);

// Unlock all Mutex/Monitors currently owned by a JavaThread when executing
// `OnError` actions.
void unlock_locks_on_error(JavaThread t);

test/hotspot/jtreg/runtime/ErrorHandling/TestOutOfMemoryErrorFromNIO.java line 32:

> 30:  * @library /test/lib
> 31:  * @run main/othervm TestOutOfMemoryErrorFromNIO
> 32:  * @bug 8155004 8273608

Please move this to after the @test line

test/hotspot/jtreg/runtime/ErrorHandling/TestOutOfMemoryErrorFromNIO.java line 65:

> 63:                     + after.toString();
> 64: 
> 65:         // else this is the main test

else ???

test/hotspot/jtreg/runtime/ErrorHandling/TestOutOfMemoryErrorFromNIO.java line 89:

> 87:         output_single.stdoutShouldMatch("^" + msg); // match start of line only
> 88: 
> 89:         System.out.println("PASSED");

Should we also check for output related to execution of the OnError command?

-------------

Changes requested by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/5590


More information about the hotspot-dev mailing list