RFR (S) 8181143: Introduce diagnostic flag to abort VM on too long VM operations

Robbin Ehn robbin.ehn at oracle.com
Mon Nov 19 08:35:10 UTC 2018


Hi Aleksey,

You patch seems not to be against jdk/jdk (jdk12).

Without the actual core file, I don't see the hs file being very useful
containing the watcherthread stacktrace. It would be good if the 'killer'
signaled the VM thread and we do the error reporting from the signal handler
instead in VM thread context.

If you are only looking at VM ops, it seem useless to wake-up when there is no 
VM op.
You should start the timer when we start the safepoint and stop the timer when
it ends. I had a patch which did:
- 'killer thread' waits on semaphore.
- VM thread post to 'killer thread' in ::begin.
- 'killer thread' do a timewait on a the semaphore.
- If VM thread manage to post on that semaphore in ::end before the timeout:
	- 'killer thread' do a normal wait again.
- Else if timeout occurs:
	- 'killer thread' starts error reporting.

You get gist of my thinking.

/Robbin

On 11/16/18 5:30 PM, Aleksey Shipilev wrote:
> RFE:
>    https://bugs.openjdk.java.net/browse/JDK-8181143
> 
> Webrev:
>    http://cr.openjdk.java.net/~shade/8181143/webrev.03/
> 
> SafepointTimeout is nice to discover long/stuck safepoint syncs. But it is as important to discover
> long/stuck VM operations. This patch complements the timeout machinery with tracking VM operation
> themselves. Among other things, this allows to terminate the VM when very long VM operation is
> blocking progress. High-availability users would enjoy fail-fast JVM -- in fact, the original
> prototype was done as request from Apache Ignite developers.
> 
> Example with -XX:+VMOperationTimeout -XX:VMOperationTimeoutDelay=100 -XX:+AbortVMOnVMOperationTimeout:
> 
> [3.117s][info][gc,start] GC(2) Pause Young (Normal) (G1 Evacuation Pause)
> [3.224s][warning][vmthread] VM Operation G1CollectForAllocation took longer than 100 ms
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (/home/sh/jdk-jdk/src/hotspot/share/runtime/vmThread.cpp:218), pid=2536, tid=2554
> #  fatal error: VM Operation G1CollectForAllocation took longer than 100 ms
> #
> 
> Testing: hotspot/tier1, ad-hoc tests, jdk-submit (pending)
> 
> Thanks,
> -Aleksey
> 


More information about the hotspot-dev mailing list