[PATCH] Deadlock in CGLGraphicsConfig.getCGLConfigInfo
Karl von Randow
karl at xk72.com
Tue Jan 10 19:44:48 UTC 2017
I have encountered a deadlock in Java 1.8.0_112 when changing between discrete and integrated GPU on a retina MacBook Pro. The deadlock is between:
CGLGraphicsConfig.getCGLConfigInfo, running on AWT.EventQueue, trying to call [GraphicsConfigUtil _getCGLConfigInfo:] on
the main thread (AppKit thread) while it holds the AWT lock and is synchronized on CGraphicsEnvironment.
and
A) the AppKit main thread trying to call CGraphicsEnvironment._displayReconfiguration (via displaycb_handle in CGraphicsEnv.m)
and synchronizing on CGraphicsEnvironment—deadlock.
or
B) the AppKit main thread trying to render, and trying to acquire the OGLRenderQueue lock (which is the the AWT lock)
SUPPORTING STACK DUMPS
- SCENARIO A
CGraphicsEnvironment._displayReconfiguration is called on the main thread since
8041900: [macosx] Java forces the use of discrete GPU (https://bugs.openjdk.java.net/browse/JDK-8041900 <https://bugs.openjdk.java.net/browse/JDK-8041900>) which appears as changeset 11227.
In the native thread dump below you can see the frame for displaycb_handle which is the block dispatched to the main thread to call
CGraphicsEnvironment._displayReconfiguration.
Java stacks
"AWT-EventQueue-0" #16 prio=6 os_prio=31 tid=0x00007fbc72a0a800 nid=0x1251f runnable [0x000070000e443000]
java.lang.Thread.State: RUNNABLE
at sun.java2d.opengl.CGLGraphicsConfig.getCGLConfigInfo(Native Method)
at sun.java2d.opengl.CGLGraphicsConfig.getConfig(CGLGraphicsConfig.java:147)
at sun.awt.CGraphicsDevice.<init>(CGraphicsDevice.java:64)
at sun.awt.CGraphicsEnvironment.initDevices(CGraphicsEnvironment.java:163)
- locked <0x00000006c0721bb0> (a sun.awt.CGraphicsEnvironment)
at sun.awt.CGraphicsEnvironment.getDefaultScreenDevice(CGraphicsEnvironment.java:181)
- locked <0x00000006c0721bb0> (a sun.awt.CGraphicsEnvironment)
at sun.lwawt.macosx.LWCToolkit.getScreenResolution(LWCToolkit.java:415)
at net.miginfocom.swing.SwingComponentWrapper.getHorizontalScreenDPI(SwingComponentWrapper.java:260)
at net.miginfocom.swing.SwingComponentWrapper.getPixelUnitFactor(SwingComponentWrapper.java:119)
[SNIP]
at java.awt.Container.layout(Container.java:1510)
at java.awt.Container.doLayout(Container.java:1499)
at java.awt.Container.validateTree(Container.java:1695)
[SNIP]
at javax.swing.RepaintManager$ProcessingRunnable.run(RepaintManager.java:1750)
[SNIP]
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
"AppKit Thread" #11 daemon prio=5 os_prio=31 tid=0x00007fbc75046800 nid=0x307 waiting for monitor entry [0x00007fff5b579000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.awt.CGraphicsEnvironment._displayReconfiguration(CGraphicsEnvironment.java:129)
- waiting to lock <0x00000006c0721bb0> (a sun.awt.CGraphicsEnvironment)
Native stacks
Thread 0x6828c3 DispatchQueue 1 Thread name "AppKit Thread" 1000 samples (1-1000) priority 46 (base 46) cpu time <0.001
1000 start + 52 (Charles + 5156) [0x104682424]
1000 main + 153 (Charles + 5321) [0x1046824c9]
1000 launch + 10872 (Charles + 16520) [0x104685088]
1000 JLI_Launch + 1952 (libjli.dylib + 5668) [0x104702624]
1000 CreateExecutionEnvironment + 871 (libjli.dylib + 22781) [0x1047068fd]
1000 CFRunLoopRunSpecific + 420 (CoreFoundation + 555380) [0x7fff97665974]
1000 __CFRunLoopRun + 934 (CoreFoundation + 556918) [0x7fff97665f76]
1000 __CFRunLoopDoSources0 + 557 (CoreFoundation + 559741) [0x7fff97666a7d]
1000 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17 (CoreFoundation + 686465) [0x7fff97685981]
1000 __NSThreadPerformPerform + 326 (Foundation + 465034) [0x7fff990c988a]
1000 -[AWTStarter starter:] + 905 (libawt_lwawt.dylib + 286207) [0x113081dff]
1000 +[NSApplicationAWT runAWTLoopWithApp:] + 156 (libosxapp.dylib + 8525) [0x1130fa14d]
[SNIP]
1000 +[JNFRunLoop _performCopiedBlock:] + 17 (JavaNativeFoundation + 28474) [0x112d0df3a]
1000 __displaycb_handle_block_invoke_1 + 172 (libawt_lwawt.dylib + 119659) [0x11305936b]
1000 JNFPerformEnvBlock + 87 (JavaNativeFoundation + 27229) [0x112d0da5d]
1000 __displaycb_handle_block_invoke_2 + 80 (libawt_lwawt.dylib + 119988) [0x1130594b4]
1000 JNFCallVoidMethod + 187 (JavaNativeFoundation + 13743) [0x112d0a5af]
1000 jni_CallVoidMethodV + 248 (libjvm.dylib + 3069241) [0x106301539]
1000 jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) + 748 (libjvm.dylib + 3124227) [0x10630ec03]
1000 JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*) + 1710 (libjvm.dylib + 3015574) [0x1062f4396]
1000 ??? [0x113db94e7]
1000 ??? [0x113dde021]
1000 InterpreterRuntime::monitorenter(JavaThread*, BasicObjectLock*) + 165 (libjvm.dylib + 2995347) [0x1062ef493]
1000 ObjectMonitor::enter(Thread*) + 472 (libjvm.dylib + 4524724) [0x106464ab4]
1000 ObjectMonitor::EnterI(Thread*) + 532 (libjvm.dylib + 4521584) [0x106463e70]
1000 os::PlatformEvent::park(long) + 404 (libjvm.dylib + 4561328) [0x10646d9b0]
1000 __psynch_cvwait + 10 (libsystem_kernel.dylib + 105606) [0x7fffaccecc86]
*1000 psynch_cvcontinue + 0 (pthread + 39138) [0xffffff7f80f978e2]
Thread 0x68292c Thread name "Java: AWT-EventQueue-0" 1000 samples (1-1000) priority 31 (base 31)
1000 thread_start + 13 (libsystem_pthread.dylib + 12797) [0x7fffacdd51fd]
1000 _pthread_start + 286 (libsystem_pthread.dylib + 14839) [0x7fffacdd59f7]
1000 _pthread_body + 180 (libsystem_pthread.dylib + 15019) [0x7fffacdd5aab]
1000 java_start(Thread*) + 246 (libjvm.dylib + 4574506) [0x106470d2a]
1000 JavaThread::run() + 448 (libjvm.dylib + 5486408) [0x10654f748]
1000 JavaThread::thread_main_inner() + 155 (libjvm.dylib + 5480593) [0x10654e091]
1000 thread_entry(JavaThread*, Thread*) + 124 (libjvm.dylib + 3270354) [0x1063326d2]
1000 JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*) + 74 (libjvm.dylib + 3017936) [0x1062f4cd0]
1000 JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*) + 356 (libjvm.dylib + 3017508) [0x1062f4b24]
1000 JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*) + 1710 (libjvm.dylib + 3015574) [0x1062f4396]
[SNIP]
1000 Java_sun_java2d_opengl_CGLGraphicsConfig_getCGLConfigInfo + 279 (libawt_lwawt.dylib + 107562) [0x11305642a]
1000 -[NSObject(NSThreadPerformAdditions) performSelectorOnMainThread:withObject:waitUntilDone:] + 131 (Foundation + 203394) [0x7fff99089a82]
1000 -[NSObject(NSThreadPerformAdditions) performSelector:onThread:withObject:waitUntilDone:modes:] + 904 (Foundation + 204424) [0x7fff99089e88]
1000 -[NSCondition wait] + 240 (Foundation + 208331) [0x7fff9908adcb]
1000 __psynch_cvwait + 10 (libsystem_kernel.dylib + 105606) [0x7fffaccecc86]
*1000 psynch_cvcontinue + 0 (pthread + 39138) [0xffffff7f80f978e2]
- Scenario B
Java stacks
"AWT-EventQueue-0" #15 prio=6 os_prio=31 tid=0x00007fba611d2000 nid=0x1260f runnable [0x0000700005365000]
java.lang.Thread.State: RUNNABLE
at sun.java2d.opengl.CGLGraphicsConfig.getCGLConfigInfo(Native Method)
at sun.java2d.opengl.CGLGraphicsConfig.getConfig(CGLGraphicsConfig.java:147)
at sun.awt.CGraphicsDevice.<init>(CGraphicsDevice.java:64)
at sun.awt.CGraphicsEnvironment.initDevices(CGraphicsEnvironment.java:163)
- locked <0x00000006c0df8c18> (a sun.awt.CGraphicsEnvironment)
at sun.awt.CGraphicsEnvironment.getDefaultScreenDevice(CGraphicsEnvironment.java:181)
- locked <0x00000006c0df8c18> (a sun.awt.CGraphicsEnvironment)
at sun.lwawt.macosx.LWCToolkit.getScreenResolution(LWCToolkit.java:415)
at net.miginfocom.swing.SwingComponentWrapper.getHorizontalScreenDPI(SwingComponentWrapper.java:260)
at net.miginfocom.swing.SwingComponentWrapper.getPixelUnitFactor(SwingComponentWrapper.java:119)
at net.miginfocom.layout.UnitValue.getPixelsExact(UnitValue.java:305)
at net.miginfocom.layout.UnitValue.getPixels(UnitValue.java:281)
[SNIP]
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
"AppKit Thread" #11 daemon prio=5 os_prio=31 tid=0x00007fba59869800 nid=0x307 waiting on condition [0x00007fff52ac2000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006c053b688> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at sun.awt.SunToolkit.awtLock(SunToolkit.java:253)
at sun.java2d.pipe.RenderQueue.lock(RenderQueue.java:112)
at sun.java2d.opengl.CGLLayer.drawInCGLContext(CGLLayer.java:139)
Native stacks
Thread 0x6764ca DispatchQueue 1 Thread name "AppKit Thread" 1000 samples (1-1000) priority 46 (base 46)
1000 start + 52 (Charles + 5156) [0x10d139424]
1000 main + 153 (Charles + 5321) [0x10d1394c9]
1000 launch + 10872 (Charles + 16520) [0x10d13c088]
1000 JLI_Launch + 1952 (libjli.dylib + 5668) [0x10d1b9624]
1000 CreateExecutionEnvironment + 871 (libjli.dylib + 22781) [0x10d1bd8fd]
1000 CFRunLoopRunSpecific + 420 (CoreFoundation + 555380) [0x7fff97665974]
1000 __CFRunLoopRun + 934 (CoreFoundation + 556918) [0x7fff97665f76]
1000 __CFRunLoopDoSources0 + 557 (CoreFoundation + 559741) [0x7fff97666a7d]
1000 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17 (CoreFoundation + 686465) [0x7fff97685981]
1000 __NSThreadPerformPerform + 326 (Foundation + 465034) [0x7fff990c988a]
1000 -[AWTStarter starter:] + 905 (libawt_lwawt.dylib + 286207) [0x12abc3dff]
1000 +[NSApplicationAWT runAWTLoopWithApp:] + 156 (libosxapp.dylib + 8525) [0x12ac3c14d]
[SNIP]
1000 CA::Transaction::observer_callback(__CFRunLoopObserver*, unsigned long, void*) + 108 (QuartzCore + 69522) [0x7fff9d393f92]
1000 CA::Transaction::commit() + 475 (QuartzCore + 67121) [0x7fff9d393631]
1000 CA::Context::commit_transaction(CA::Transaction*) + 280 (QuartzCore + 1153144) [0x7fff9d49c878]
1000 CA::Layer::layout_and_display_if_needed(CA::Transaction*) + 35 (QuartzCore + 1196185) [0x7fff9d4a7099]
1000 CA::Layer::display_if_needed(CA::Transaction*) + 572 (QuartzCore + 1195886) [0x7fff9d4a6f6e]
1000 -[CAOpenGLLayer _display] + 351 (QuartzCore + 1117583) [0x7fff9d493d8f]
1000 CAOpenGLLayerDraw(CAOpenGLLayer*, double, CVTimeStamp const*, unsigned int) + 873 (QuartzCore + 1118737) [0x7fff9d494211]
1000 -[CGLLayer drawInCGLContext:pixelFormat:forLayerTime:displayTime:] + 287 (libawt_lwawt.dylib + 109022) [0x12ab989de]
1000 JNFCallVoidMethod + 187 (JavaNativeFoundation + 13743) [0x12a84f5af]
1000 jni_CallVoidMethodV + 248 (libjvm.dylib + 3069241) [0x10edb8539]
1000 jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) + 748 (libjvm.dylib + 3124227) [0x10edc5c03]
1000 JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*) + 1710 (libjvm.dylib + 3015574) [0x10edab396]
1000 ??? [0x10ffa0854]
1000 ??? [0x11027642a]
1000 Unsafe_Park + 126 (libjvm.dylib + 5571927) [0x10f01b557]
1000 Parker::park(bool, long) + 495 (libjvm.dylib + 4560765) [0x10ef2477d]
1000 __psynch_cvwait + 10 (libsystem_kernel.dylib + 105606) [0x7fffaccecc86]
*1000 psynch_cvcontinue + 0 (pthread + 39138) [0xffffff7f80f978e2]
Thread 0x67652b Thread name "Java: AWT-EventQueue-0" 1000 samples (1-1000) priority 31 (base 31)
1000 thread_start + 13 (libsystem_pthread.dylib + 12797) [0x7fffacdd51fd]
1000 _pthread_start + 286 (libsystem_pthread.dylib + 14839) [0x7fffacdd59f7]
1000 _pthread_body + 180 (libsystem_pthread.dylib + 15019) [0x7fffacdd5aab]
1000 java_start(Thread*) + 246 (libjvm.dylib + 4574506) [0x10ef27d2a]
1000 JavaThread::run() + 448 (libjvm.dylib + 5486408) [0x10f006748]
1000 JavaThread::thread_main_inner() + 155 (libjvm.dylib + 5480593) [0x10f005091]
1000 thread_entry(JavaThread*, Thread*) + 124 (libjvm.dylib + 3270354) [0x10ede96d2]
[SNIP]
1000 ??? [0x10f7a0734]
1000 Java_sun_java2d_opengl_CGLGraphicsConfig_getCGLConfigInfo + 279 (libawt_lwawt.dylib + 107562) [0x12ab9842a]
1000 -[NSObject(NSThreadPerformAdditions) performSelectorOnMainThread:withObject:waitUntilDone:] + 131 (Foundation + 203394) [0x7fff99089a82]
1000 -[NSObject(NSThreadPerformAdditions) performSelector:onThread:withObject:waitUntilDone:modes:] + 904 (Foundation + 204424) [0x7fff99089e88]
1000 -[NSCondition wait] + 240 (Foundation + 208331) [0x7fff9908adcb]
1000 __psynch_cvwait + 10 (libsystem_kernel.dylib + 105606) [0x7fffaccecc86]
*1000 psynch_cvcontinue + 0 (pthread + 39138) [0xffffff7f80f978e2]
INTERPRETATION
The deadlock is a race condition when macOS changes between the discrete and integrated GPU.
When the GPU changes, the result of CGraphicsEnvironment.getMainDisplayID() changes immediately (There is a comment in CGraphicsEnvironment.m
that notes that the display ID changes in this case, and I have verified this) to return the new displayID, while the devices map is only built once initDevices() is called.
CGLGraphicsConfig.getCGLConfigInfo (which is called as a consequence of initDevices, as per stack traces) calls out and waits on the AppKit main thread. I think this is
always dangerous due to the locks that the code calling it holds. I think we should avoid getCGLConfigInfo being called on anything but the AppKit main thread. I believe
this was the intention of 8041900: [macosx] Java forces the use of discrete GPU (https://bugs.openjdk.java.net/browse/JDK-8041900 <https://bugs.openjdk.java.net/browse/JDK-8041900>).
CGraphicsEnvironment.getDefaultScreenDevice() is called from AWT layout code (as per the stacks) and it calls CGraphicsEnvironment.getMainDisplayID() each time.
If CGraphicsEnvironment.getDefaultScreenDevice() is called _after_ the GPU change, but _before_ CGraphicsEnvironment._displayReconfiguration() has been called,
the CGraphicsDevice for the new display ID cannot be found in the devices Map, so initDevices() is called from CGraphicsEnvironment.getDefaultScreenDevice()
on the AWT-EventQueue thread.
There is a note in getDefaultScreenDevice() for this case:
we do not expect that this may happen, the only response is to re-initialize the list of devices
Calling initDevices() here results in a call to CGLGraphicsConfig.getCGLConfigInfo, which then calls
[GraphicsConfigUtil _getCGLConfigInfo:] on the AppKit main thread and waits for the result.
As the current thread (AWT Event queue) is holding the AWT lock, and is synchronized on CGraphicsEnvironment, the two deadlock
conditions described above can occur.
REPRODUCABILITY
This happens quite regularly on my machine, and for my users. To reproduce it I have launched my app while the integrated GPU is active, then launched and quit an app that requires the discrete GPU. One to five repetitions are required to create the hanging condition.
I believe the issue is triggered by my use of MigLayout, which results in the call to CGraphicsEnvironment as per this excerpt from the stack traces above:
at sun.awt.CGraphicsEnvironment.getDefaultScreenDevice(CGraphicsEnvironment.java:181)
- locked <0x00000006c0721bb0> (a sun.awt.CGraphicsEnvironment)
at sun.lwawt.macosx.LWCToolkit.getScreenResolution(LWCToolkit.java:415)
at net.miginfocom.swing.SwingComponentWrapper.getHorizontalScreenDPI(SwingComponentWrapper.java:260)
at net.miginfocom.swing.SwingComponentWrapper.getPixelUnitFactor(SwingComponentWrapper.java:119)
PATCH
I believe the solution is to remember the main display ID along with the devices Map, and to change the main display ID when initDevices is called.
This appears to work in my setup. There is however _sometimes_ a flash of half-size rendering, presumably while the rendering is working on the old device
before the reconfiguration / initDevices occurs.
Below is a simple patch to demonstrate that approach. Generally I don’t think initDevices() should ever be called on the AWT-EventQueue, but in my tests (as per the comment)
that no longer happens with this patch.
diff -r 5dd7e4bae5c2 src/macosx/classes/sun/awt/CGraphicsEnvironment.java
--- a/src/macosx/classes/sun/awt/CGraphicsEnvironment.java Thu Sep 22 13:17:42 2016 -0700
+++ b/src/macosx/classes/sun/awt/CGraphicsEnvironment.java Sat Jan 07 20:49:39 2017 +1300
@@ -95,6 +95,7 @@
/** Available CoreGraphics displays. */
private final Map<Integer, CGraphicsDevice> devices = new HashMap<>(5);
+ private int inittedMainDisplayID;
/** Reference to the display reconfiguration callback context. */
private final long displayReconfigContext;
@@ -153,6 +154,7 @@
devices.clear();
int mainID = getMainDisplayID();
+ inittedMainDisplayID = mainID;
// initialization of the graphics device may change
// list of displays on hybrid systems via an activation
@@ -173,14 +175,13 @@
@Override
public synchronized GraphicsDevice getDefaultScreenDevice() throws HeadlessException {
- final int mainDisplayID = getMainDisplayID();
- CGraphicsDevice d = devices.get(mainDisplayID);
+ CGraphicsDevice d = devices.get(inittedMainDisplayID);
if (d == null) {
// we do not expect that this may happen, the only response
// is to re-initialize the list of devices
initDevices();
- d = devices.get(mainDisplayID);
+ d = devices.get(inittedMainDisplayID);
if (d == null) {
throw new AWTError("no screen devices");
}
More information about the jdk8u-dev
mailing list