RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v4]

Thomas Stuefe stuefe at openjdk.java.net
Tue Oct 5 18:45:13 UTC 2021


On Mon, 4 Oct 2021 06:39:31 GMT, Xin Liu <xliu at openjdk.org> wrote:

>> This patch allows the custom commands of OnError to attach to HotSpot itself. 
>> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). 
>> This prevents cmds which require safepoint synchronization from deadlock.
>> eg. OnError='jcmd %p Thread.print'.
>> 
>> Without this patch, we will encounter a deadlock at safepoint synchronization. 
>> `"main" #1`  is the very thread which executes `os::fork_and_exec(cmd)`.  
>> 
>> 
>> Aborting due to java.lang.OutOfMemoryError: Java heap space
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (debug.cpp:364), pid=94632, tid=94633
>> #  fatal error: OutOfMemory encountered: Java heap space
>> #
>> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log
>> #
>> # -XX:OnError="jcmd %p Thread.print"
>> #   Executing /bin/sh -c "jcmd 94632 Thread.print" ...
>> 94632:
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected:
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:
>> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable  [0x00007f01b7a08000]
>> [10.616s][warning][safepoint]    java.lang.Thread.State: RUNNABLE
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
>
> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Cleanup: remove a statment of debugging.

Hi Xin,

nice work. Good test.

Some nitpicking inline.

About the test, I still would prefer very much to limit this workaround to OOM errors only in VMError, but David had a different opinion, so I won't press the point. 

However, if we do this for normal crashes+OnError too, I'd like to see a test for that (you can simply reuse your test, add a second `@run` section and just start the VM with `-XX:ErrorHandlerTest=14 -XX:OnError=... -version`). Of course, that one would require `vm.debug`.

Cheers, Thoams

src/hotspot/share/runtime/mutexLocker.cpp line 376:

> 374: }
> 375: 
> 376: void unlock_locks_on_error(JavaThread* thread) {

Does this have to live in mutexLocker.cpp? Seems to be highly specific to error handling and would make more sense as a function local static in vmError.cpp.

src/hotspot/share/utilities/vmError.cpp line 1320:

> 1318: // Otherwise, it may end up with a deadlock when dcmds try to synchronize all Java threads
> 1319: // at safepoints.
> 1320: bool transition_into_native() {

Since this is a highly specific workaround for one specific problem (forking off jcmd which will attach back to us with a sub command which needs a safe point) I would suggest naming this function somewhat more specific, e.g. "prepare_jcmd_workaround" or "prepare_fork_and_exec" (though the latter is misleading since this is needed only for the jcmd-attaches-to-me case).

For these cases, I also prefer the JBS issue number mentioned in a comment. Its helpful when one looks at the code, and one does not have to dig through the change history.

test/hotspot/jtreg/runtime/ErrorHandling/TestOutOfMemoryErrorFromNIO.java line 25:

> 23: 
> 24: /*
> 25:  * @test TestOutOfMemoryErrorFromNIO

I would rename the test. The point is not to test OOM from NIO, but to test XX:OnError with jcmd which attaches to the parent. 

You could do the same with any OOM, right? E.g. just start the VM with a very small MaxMetaspaceSize. You would not even need a main function for that, since the VM would not come up but throw a OOM right away,

Proposal (but feel free to come up with something better): "TestOnErrorWithSelfAttachingJCmd"

test/hotspot/jtreg/runtime/ErrorHandling/TestOutOfMemoryErrorFromNIO.java line 59:

> 57:         StringBuilder after = new StringBuilder(jcmd);
> 58:         after.append(" %p");
> 59:         after.append(" GC.heap_dump a.hprof");

Could be probably simplified to:

String jcmd = JDKToolFinder.getJDKTool("jcmd");
String before = jcmd + " %p Thread.print";
String after = jcmd + " %p GC.heap_dump a.hprof;

test/hotspot/jtreg/runtime/ErrorHandling/TestOutOfMemoryErrorFromNIO.java line 69:

> 67:            "-XX:+UnlockDiagnosticVMOptions",
> 68:            "-XX:AbortVMOnException=java.lang.OutOfMemoryError",
> 69:            "-XX:OnError=" + cmds,

Please specify a (small) heap size.

test/hotspot/jtreg/runtime/ErrorHandling/TestOutOfMemoryErrorFromNIO.java line 84:

> 82:            # -XX:OnError="echo Test1 Succeeded"
> 83:            #   Executing /bin/sh -c "echo Test1 Succeeded" ...
> 84:         */

Can you make this a line comment (`//`)?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5590


More information about the hotspot-dev mailing list