<AWT Dev> [8] Review request for 8015730: PIT: On Linux, OGL=true and fbobject=false leads to deadlock or infinite loop

Anthony Petrov anthony.petrov at oracle.com
Thu Jul 4 04:43:11 PDT 2013


Thank you, Anton. I'm fine with your fix, too.

--
best regards,
Anthony

On 07/04/13 14:53, Artem Ananiev wrote:
>
> On 7/4/2013 1:47 PM, Anton Litvinov wrote:
>> Hello Artem and Anthony,
>>
>> Thank you for review of this fix. I would like to inform you that by
>> request of Anthony a separate new issue about the problem with
>> synchronization between "sun.java2d.opengl.OGLRenderQueue" instance and
>> Java2D Queue Flusher thread, whose priority is P4, has been created. The
>> ID of the new CR is 8019923, and it is linked to the current bug 8015730.
>
> OK, I'm fine with the current fix then.
>
> Thanks,
>
> Artem
>
>> Thank you,
>> Anton
>>
>> On 7/4/2013 1:31 PM, Artem Ananiev wrote:
>>>
>>> On 7/3/2013 9:49 PM, Anthony Petrov wrote:
>>>> On 07/03/2013 08:35 PM, Artem Ananiev wrote:
>>>>>> src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c
>>>>>>> 398 XSync(awt_display, False);
>>>>>
>>>>> OK, got it.
>>>>>
>>>>>> All other usages of the EXEC_WITH_XERROR_HANDLER() pass TRUE to the
>>>>>> RESTORE_XERROR_HANDLER(), and hence call XSync(), too (in
>>>>>> XErrorHandlerUtil.java) with or without the current fix...
>>>>>>
>>>>> I looked to the webrev for 8005607 once again and found XSync() was
>>>>> there before that fix. And you're right, it's required to flush Xlib
>>>>> quene.
>>>>>
>>>>> OK, so I currently don't see a way to improve the current fix. It
>>>>> may be
>>>>> possible to rewrite OGL render queue synchronization, but it's out of
>>>>> scope of this fix.
>>>>
>>>> I agree that this is out of the scope of the current fix. However, the
>>>> very fact of XSync() being invoked w/o holding the AWTLock needs to be
>>>> addressed at some point. So I suggest to file a separate P4 bug to
>>>> investigate this later (probably in a future release).
>>>
>>> The lock is actually hold, but by another thread (which then is
>>> waiting for the flusher thread). As AWT lock is used to prevent access
>>> to X display simultaneously from multiple threads, not to conform to
>>> Java memory model, such a pattern seems to be valid. It's not trivial
>>> to understand, though, I agree.
>>>
>>> Thanks,
>>>
>>> Artem
>>>
>>>> --
>>>> best regards,
>>>> Anthony
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Artem
>>>>>
>>>>>> --
>>>>>> best regards,
>>>>>> Anthony
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Artem
>>>>>>>
>>>>>>>> --
>>>>>>>> best regards,
>>>>>>>> Anthony
>>>>>>>>
>>>>>>>> On 07/01/2013 07:24 PM, Anton Litvinov wrote:
>>>>>>>>> Hello Anthony,
>>>>>>>>>
>>>>>>>>> Thank you for the review of this fix. I would like to remark that
>>>>>>>>> this
>>>>>>>>> deadlock is a regression of the fix for the bug 8005607, and in
>>>>>>>>> the
>>>>>>>>> code
>>>>>>>>> of the file
>>>>>>>>> "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c"
>>>>>>>>> before 8005607 fix, where the previous XError handling mechanism
>>>>>>>>> not
>>>>>>>>> involving "sun.awt.X11.XErrorHandlerUtil" class was used, native
>>>>>>>>> "XSync()" function was called without acquiring of AWT lock. So
>>>>>>>>> a fix
>>>>>>>>> for the current bug with a deadlock just reverted a part of the
>>>>>>>>> fix
>>>>>>>>> 8005607 which enforced taking AWT lock from the function
>>>>>>>>> "Java_sun_java2d_opengl_GLXSurfaceData_initPbuffer" in the file
>>>>>>>>> "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c".
>>>>>>>>> Answers to
>>>>>>>>> your questions are provided below.
>>>>>>>>>
>>>>>>>>> 1. "AWT EventQueue" holds AWT lock and waits till "Java2D Queue
>>>>>>>>> Flusher"
>>>>>>>>> thread finishes its job, because in the method
>>>>>>>>> "sun.java2d.opengl.OGLSurfaceData.initSurface(final int width,
>>>>>>>>> final
>>>>>>>>> int
>>>>>>>>> height)" execution of "initSurfaceNow(int width, int height)" is
>>>>>>>>> dispatched to "Java2D Queue Flusher". Before this dispatching in
>>>>>>>>> the
>>>>>>>>> method "initSurface" AWT lock is taken by the lines
>>>>>>>>>
>>>>>>>>> 308 OGLRenderQueue rq = OGLRenderQueue.getInstance();
>>>>>>>>> 309 rq.lock();
>>>>>>>>>
>>>>>>>>> and then with held AWT lock "AWT EventQueue" thread starts
>>>>>>>>> waiting on
>>>>>>>>> the second lock "sun.java2d.opengl.OGLRenderQueue.flusher" in the
>>>>>>>>> method
>>>>>>>>> "sun.java2d.opengl.OGLRenderQueue.QueueFlusher.flushNow()"
>>>>>>>>>
>>>>>>>>> 181 wait();
>>>>>>>>>
>>>>>>>>> 2. Yes, I investigated the option of waiting on AWT lock
>>>>>>>>> instead of
>>>>>>>>> "sun.java2d.opengl.OGLRenderQueue.flusher" lock in the class
>>>>>>>>> "sun.java2d.opengl.OGLRenderQueue", but this is impossible,
>>>>>>>>> because
>>>>>>>>> access to the always running thread "Java2D Queue Flusher"
>>>>>>>>> should be
>>>>>>>>> synchronized on some lock other than AWT lock, otherwise there
>>>>>>>>> will
>>>>>>>>> be a
>>>>>>>>> performance degradation, because it will be trying to get AWT lock
>>>>>>>>> each
>>>>>>>>> 100 milliseconds. As I understood a possible solution for this
>>>>>>>>> problem
>>>>>>>>> can be not locking on AWT lock before dispatching execution of any
>>>>>>>>> code
>>>>>>>>> to "Java2D Queue Flusher" or complete refactoring of locking
>>>>>>>>> mechanism
>>>>>>>>> in the class "sun.java2d.opengl.OGLRenderQueue". Since the current
>>>>>>>>> bug
>>>>>>>>> blocks SQE from running any tests involving OpenGL and does not
>>>>>>>>> allow to
>>>>>>>>> run any Java GUI application with enabled OpenGL rendering on
>>>>>>>>> Linux
>>>>>>>>> OS,
>>>>>>>>> I suppose the variant of refactoring is not acceptable. That is
>>>>>>>>> why as
>>>>>>>>> the most secure solution I decided just to call XSync() from
>>>>>>>>> "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c" as
>>>>>>>>> it was
>>>>>>>>> before the fix for 8005607.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Anton
>>>>>>>>>
>>>>>>>>> On 7/1/2013 3:11 PM, Anthony Petrov wrote:
>>>>>>>>>> Hi Anton,
>>>>>>>>>>
>>>>>>>>>> I'm not sure if this a good fix since it enabled the GL thread to
>>>>>>>>>> call
>>>>>>>>>> Xlib APIs w/o acquiring the AWTLock. This may not present a
>>>>>>>>>> problem
>>>>>>>>>> currently since we know exactly when this method is called and
>>>>>>>>>> that
>>>>>>>>>> another thread is holding the lock and isn't calling other X11
>>>>>>>>>> functions at the moment. But I doubt this knowledge will be
>>>>>>>>>> widely
>>>>>>>>>> known and remembered in the future, and if another thread starts
>>>>>>>>>> calling X11 routines, we'll get into trouble...
>>>>>>>>>>
>>>>>>>>>> Why would another thread (the AWT EventQueue if I got the problem
>>>>>>>>>> right) hold the AWTLock and wait till the GL thread finishes its
>>>>>>>>>> job?
>>>>>>>>>> I'd assume it should release the lock for the period of waiting.
>>>>>>>>>> This
>>>>>>>>>> would allow the GL thread to acquire the lock and perform the
>>>>>>>>>> XSync()
>>>>>>>>>> call w/o any potential issues. Have you investigated this option?
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> best regards,
>>>>>>>>>> Anthony
>>>>>>>>>>
>>>>>>>>>> On 06/28/2013 09:16 PM, Anton Litvinov wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> Could you please review the following fix for a bug, which
>>>>>>>>>>> consists
>>>>>>>>>>> in a
>>>>>>>>>>> deadlock provoked by concurrency between AWT-EventQueue and
>>>>>>>>>>> Java2D
>>>>>>>>>>> Queue
>>>>>>>>>>> Flusher for taking AWT lock, when OpenGL rendering is enabled.
>>>>>>>>>>>
>>>>>>>>>>> Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8015730
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~alitvinov/8015730/webrev.00
>>>>>>>>>>>
>>>>>>>>>>> The fix allows the code from the native function
>>>>>>>>>>> "Java_sun_java2d_opengl_GLXSurfaceData_initPbuffer" of the file
>>>>>>>>>>> "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c"
>>>>>>>>>>> execute
>>>>>>>>>>> all
>>>>>>>>>>> XError handling procedures using "sun.awt.X11.XErrorHandlerUtil"
>>>>>>>>>>> class
>>>>>>>>>>> without acquiring AWT lock. It is the only available solution
>>>>>>>>>>> for
>>>>>>>>>>> this
>>>>>>>>>>> problem, because the current design of
>>>>>>>>>>> "sun.java2d.opengl.OGLRenderQueue" class does not allow to
>>>>>>>>>>> take AWT
>>>>>>>>>>> lock
>>>>>>>>>>> in Java2D Queue Flusher thread without reaching a deadlock,
>>>>>>>>>>> since
>>>>>>>>>>> all
>>>>>>>>>>> calls to the method
>>>>>>>>>>> "sun.java2d.opengl.OGLRenderQueue.flushAndInvokeNow(Runnable r)"
>>>>>>>>>>> are
>>>>>>>>>>> guarded by AWT lock.
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Anton
>>>>>>>>>
>>


More information about the awt-dev mailing list