<AWT Dev> [8] Review request for 8015730: PIT: On Linux, OGL=true and fbobject=false leads to deadlock or infinite loop

Anton Litvinov anton.litvinov at oracle.com
Wed Jul 3 08:45:16 PDT 2013


Hello Artem,

Thank you for the review of this fix. No, I am still convinced that the 
fix reverts XError handling part of the function 
"Java_sun_java2d_opengl_GLXSurfaceData_initPbuffer" in the file 
"jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c" to not 
taking AWT lock for a call to XSync() function as it was before 8005607. 
It is so, because the fix adds an explicit call to XSync() function, 
before a call of the macro RESTORE_XERROR_HANDLER. Please look at the 
next line in the file 
"src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c" from the webrev.

398     XSync(awt_display, False);

I suppose that XSync() function calls were in the native macros 
WITH_XERROR_HANDLER, RESTORE_XERROR_HANDLER before 8005607 to initiate 
dispatching of the XErrors, which were collected during the previous 
calls to Xlib, to the XError handler registered by the application.

Unfortunately, I do not see how XSync() calls were introduced by the fix 
8005607 to XAWT and Java2D code, since that fix just moved XError 
handling code from "sun.awt.X11.XToolkit" class to a new separate class 
"sun.awt.X11.XErrorHandlerUtil". So I am sure that XSync() function was 
called for all calls of both the macro RESTORE_XERROR_HANDLER and the 
method "sun.awt.X11.XToolkit.RESTORE_XERROR_HANDLER()" before the fix 
8005607.

Thank you,
Anton

On 7/3/2013 6:47 PM, Artem Ananiev wrote:
>
> On 7/1/2013 8:50 PM, Anthony Petrov wrote:
>> Thanks for the additional information, Anton. Since this fix simply
>> reverts the behavior in GLXSurfaceData.c back to the pre-8005607 era, it
>> could probably be considered a good interim solution for the problem.
>>
>> I'd like to hear Artem's opinion on this, though. Should we file a P4
>> bug to investigate the issue further so that in the future we could
>> avoid calling XSync() w/o the AWTLock?
>
> I raised exactly the same concerns about calling XSync and waiting on 
> the AWT lock, when we discussed this issue offline with Anton. Before 
> the fix for 8005607, XSync() was called without AWT lock, which is not 
> what the current fix is: now XSync() is not called at all. So behavior 
> will be different than it was before 8005607.
>
> Anton, did you investigate, why XSync() was in the native macros 
> before 8005607? Note that all the XErrorHandler code in XAWT worked 
> fine without XSync() before that fix, so why does the native Java2D 
> require this?
>
> Thanks,
>
> Artem
>
>> -- 
>> best regards,
>> Anthony
>>
>> On 07/01/2013 07:24 PM, Anton Litvinov wrote:
>>> Hello Anthony,
>>>
>>> Thank you for the review of this fix. I would like to remark that this
>>> deadlock is a regression of the fix for the bug 8005607, and in the 
>>> code
>>> of the file "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c"
>>> before 8005607 fix, where the previous XError handling mechanism not
>>> involving "sun.awt.X11.XErrorHandlerUtil" class was used, native
>>> "XSync()" function was called without acquiring of AWT lock. So a fix
>>> for the current bug with a deadlock just reverted a part of the fix
>>> 8005607 which enforced taking AWT lock from the function
>>> "Java_sun_java2d_opengl_GLXSurfaceData_initPbuffer" in the file
>>> "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c". Answers to
>>> your questions are provided below.
>>>
>>> 1. "AWT EventQueue" holds AWT lock and waits till "Java2D Queue 
>>> Flusher"
>>> thread finishes its job, because in the method
>>> "sun.java2d.opengl.OGLSurfaceData.initSurface(final int width, final 
>>> int
>>> height)" execution of "initSurfaceNow(int width, int height)" is
>>> dispatched to "Java2D Queue Flusher". Before this dispatching in the
>>> method "initSurface" AWT lock is taken by the lines
>>>
>>>      308        OGLRenderQueue rq = OGLRenderQueue.getInstance();
>>>      309        rq.lock();
>>>
>>> and then with held AWT lock "AWT EventQueue" thread starts waiting on
>>> the second lock "sun.java2d.opengl.OGLRenderQueue.flusher" in the 
>>> method
>>> "sun.java2d.opengl.OGLRenderQueue.QueueFlusher.flushNow()"
>>>
>>>      181                    wait();
>>>
>>> 2. Yes, I investigated the option of waiting on AWT lock instead of
>>> "sun.java2d.opengl.OGLRenderQueue.flusher" lock in the class
>>> "sun.java2d.opengl.OGLRenderQueue", but this is impossible, because
>>> access to the always running thread "Java2D Queue Flusher" should be
>>> synchronized on some lock other than AWT lock, otherwise there will 
>>> be a
>>> performance degradation, because it will be trying to get AWT lock each
>>> 100 milliseconds. As I understood a possible solution for this problem
>>> can be not locking on AWT lock before dispatching execution of any code
>>> to "Java2D Queue Flusher" or complete refactoring of locking mechanism
>>> in the class "sun.java2d.opengl.OGLRenderQueue". Since the current bug
>>> blocks SQE from running any tests involving OpenGL and does not 
>>> allow to
>>> run any Java GUI application with enabled OpenGL rendering on Linux OS,
>>> I suppose the variant of refactoring is not acceptable. That is why as
>>> the most secure solution I decided just to call XSync() from
>>> "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c" as it was
>>> before the fix for 8005607.
>>>
>>> Thank you,
>>> Anton
>>>
>>> On 7/1/2013 3:11 PM, Anthony Petrov wrote:
>>>> Hi Anton,
>>>>
>>>> I'm not sure if this a good fix since it enabled the GL thread to call
>>>> Xlib APIs w/o acquiring the AWTLock. This may not present a problem
>>>> currently since we know exactly when this method is called and that
>>>> another thread is holding the lock and isn't calling other X11
>>>> functions at the moment. But I doubt this knowledge will be widely
>>>> known and remembered in the future, and if another thread starts
>>>> calling X11 routines, we'll get into trouble...
>>>>
>>>> Why would another thread (the AWT EventQueue if I got the problem
>>>> right) hold the AWTLock and wait till the GL thread finishes its job?
>>>> I'd assume it should release the lock for the period of waiting. This
>>>> would allow the GL thread to acquire the lock and perform the XSync()
>>>> call w/o any potential issues. Have you investigated this option?
>>>>
>>>> -- 
>>>> best regards,
>>>> Anthony
>>>>
>>>> On 06/28/2013 09:16 PM, Anton Litvinov wrote:
>>>>> Hello,
>>>>>
>>>>> Could you please review the following fix for a bug, which consists
>>>>> in a
>>>>> deadlock provoked by concurrency between AWT-EventQueue and Java2D
>>>>> Queue
>>>>> Flusher for taking AWT lock, when OpenGL rendering is enabled.
>>>>>
>>>>> Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8015730
>>>>> Webrev: http://cr.openjdk.java.net/~alitvinov/8015730/webrev.00
>>>>>
>>>>> The fix allows the code from the native function
>>>>> "Java_sun_java2d_opengl_GLXSurfaceData_initPbuffer" of the file
>>>>> "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c" 
>>>>> execute all
>>>>> XError handling procedures using "sun.awt.X11.XErrorHandlerUtil" 
>>>>> class
>>>>> without acquiring AWT lock. It is the only available solution for 
>>>>> this
>>>>> problem, because the current design of
>>>>> "sun.java2d.opengl.OGLRenderQueue" class does not allow to take AWT
>>>>> lock
>>>>> in Java2D Queue Flusher thread without reaching a deadlock, since all
>>>>> calls to the method
>>>>> "sun.java2d.opengl.OGLRenderQueue.flushAndInvokeNow(Runnable r)" are
>>>>> guarded by AWT lock.
>>>>>
>>>>> Thank you,
>>>>> Anton
>>>



More information about the awt-dev mailing list