<AWT Dev> <Awt Dev> [9] Review Request for 8132664: closed/javax/swing/DataTransfer/DefaultNoDrop/DefaultNoDrop.java locks on Windows

Semyon Sadetsky semyon.sadetsky at oracle.com
Tue Sep 8 16:17:39 UTC 2015



On 9/8/2015 6:13 PM, Sergey Bylokhov wrote:
> On 08.09.15 18:02, Semyon Sadetsky wrote:
>>
>>
>> On 9/8/2015 2:06 PM, Sergey Bylokhov wrote:
>>> On 08.09.15 13:42, Semyon Sadetsky wrote:
>>>>
>>>>
>>>> On 9/8/2015 1:31 PM, Sergey Bylokhov wrote:
>>>>> On 08.09.15 13:15, Semyon Sadetsky wrote:
>>>>>>
>>>>>>
>>>>>> On 9/8/2015 12:33 PM, Sergey Bylokhov wrote:
>>>>>>> On 08.09.15 12:04, Semyon Sadetsky wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9/7/2015 6:56 PM, Sergey Bylokhov wrote:
>>>>>>>>> On 07.09.15 16:41, Semyon Sadetsky wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 9/7/2015 3:28 PM, Sergey Bylokhov wrote:
>>>>>>>>>>> On 03.09.15 17:43, Semyon Sadetsky wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 8/5/2015 2:33 PM, Sergey Bylokhov wrote:
>>>>>>>>>>>>> On 05.08.15 14:20, Semyon Sadetsky wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 8/5/2015 1:39 PM, Sergey Bylokhov wrote:
>>>>>>>>>>>>>>> On 05.08.15 13:18, Semyon Sadetsky wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 8/5/2015 12:27 PM, Sergey Bylokhov wrote:
>>>>>>>>>>>>>>>>> On 04.08.15 14:54, Semyon Sadetsky wrote:
>>>>>>>>>>>>>>>>>> On 8/3/2015 6:05 PM, Sergey Bylokhov wrote:
>>>>>>>>>>>>>>>>>>> Hi, Semyon
>>>>>>>>>>>>>>>>>>> Did you try to change dwMilliseconds from INFINITE 
>>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>>> timeout(10 seconds by default?) which is passed to the
>>>>>>>>>>>>>>>>>>> method?
>>>>>>>>>>>>>>>>>>> It does not help? Because even when dnd is not used we
>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>> not wait event for infinite time.
>>>>>>>>>>>>>>>>>> It would not help to fix the issue because 10 seconds is
>>>>>>>>>>>>>>>>>> too big
>>>>>>>>>>>>>>>>>> interval. But for consistency it is not bad to have a 
>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>> limit.
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ssadetsky/8132664/webrev.01/
>>>>>>>>>>>>>>>>> Note that syncNativeQueue is intended to wait until the
>>>>>>>>>>>>>>>>> native
>>>>>>>>>>>>>>>>> event queue is flushed or until timeout will expire. So
>>>>>>>>>>>>>>>>> even if
>>>>>>>>>>>>>>>>> timeout is expired we collect the native events during 
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> period
>>>>>>>>>>>>>>>>> of time.Can you double check that the event counter is
>>>>>>>>>>>>>>>>> incremented
>>>>>>>>>>>>>>>>> during dnd? I do not know how we block the toolkit 
>>>>>>>>>>>>>>>>> thread,
>>>>>>>>>>>>>>>>> probably we create some nested loops which ignore our 
>>>>>>>>>>>>>>>>> event
>>>>>>>>>>>>>>>>> posted
>>>>>>>>>>>>>>>>> from syncNativeQueue, can we change that?
>>>>>>>>>>>>>>>> Yes, this is an internal secondary loop which waits for
>>>>>>>>>>>>>>>> mouse
>>>>>>>>>>>>>>>> release event.
>>>>>>>>>>>>>>> Can we change the condition and process the sync event in
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> loop?
>>>>>>>>>>>>>> Why? Will receive all events on the toolkit thread when
>>>>>>>>>>>>>> doDragDrop
>>>>>>>>>>>>>> returns.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When how we get dragenter/exit events?
>>>>>>>>>>>>
>>>>>>>>>>>> On the platform side they are not events but callbacks 
>>>>>>>>>>>> which are
>>>>>>>>>>>> converted into events on the java side and added to the
>>>>>>>>>>>> queue. So
>>>>>>>>>>>> they
>>>>>>>>>>>> will be detected by isEQEmpty() and waitForIdle() will be
>>>>>>>>>>>> repeated.
>>>>>>>>>>>
>>>>>>>>>>> This is not directly related to this fix. Most of our
>>>>>>>>>>> callbacks/events
>>>>>>>>>>> posts events to the EQ, but there is a possibility for a lag
>>>>>>>>>>> between
>>>>>>>>>>> callback and a post events, and this is why the
>>>>>>>>>>> syncNativeQueue was
>>>>>>>>>>> added.
>>>>>>>>>>> Let's return to the beginning: the syncNativeQueue method
>>>>>>>>>>> according to
>>>>>>>>>>> its specifications should try to flush the native system,
>>>>>>>>>>> track the
>>>>>>>>>>> native activity and should not wait more than timeout 
>>>>>>>>>>> parameter.
>>>>>>>>>>
>>>>>>>>>> It is not possible to send sync events to the DD loop. Only
>>>>>>>>>> mouse/keyboard event can return control to the app. I would not
>>>>>>>>>> emulate
>>>>>>>>>> them in the syncnative.
>>>>>>>>>> Always waiting the max time is not good. Maybe wait a bit and
>>>>>>>>>> check if
>>>>>>>>>> no callbacks happened?
>>>>>>>>>
>>>>>>>>> It will be good to check that callbacks were occurred in the
>>>>>>>>> same way
>>>>>>>>> as native events. This can be done in this fix. As for question
>>>>>>>>> about
>>>>>>>>> wait or not when timeout is over, is a shared problem, which 
>>>>>>>>> affect
>>>>>>>>> the osx also. It can be fixed separately. I guess the logic in
>>>>>>>>> realSync can be tweaked and the default timeout(10*4 sec) is 
>>>>>>>>> not a
>>>>>>>>> optimal as a default value, 2 or 3 seconds without any events
>>>>>>>>> should
>>>>>>>>> be enough.
>>>>>>>> Could you explain your logic?
>>>>>>>
>>>>>>> My logic is to follow the specification.
>>>>>> I cannot accept this as an answer because it is an abstract
>>>>>> declaration.
>>>>>> Please, try to be more specific.
>>>>>
>>>>> Just implement as I described it above.
>>>>>
>>>> Sorry?
>>>
>>> Just change implementation of syncNativeQueue to don't block forever
>>> if timeout parameter is positive. And wait forever in case of negative
>>> timeout. This is enough to fix for now, now need to add some
>>> workarounds for dnd, since the realsync itself should be reworked.
>>>
>> Sergey, I'm a bit tired of this level of explanations.
>
> Explanations is simple just implement the method according 
> specification, since the problem is general and is not specific it 
> should be fixed on the shared level.
>
An internal method javadoc can be changed any time.
>> What do you mean under "for now" and why the realsync itself should be
>> reworked?
>
> For now means for now. There are a few fixes which were pushed to the 
> jdk9 in to realsync area, as well as a few fixes to the tests 
> (regression/sqe), and we already got a bunch of bugs. And "for now" we 
> should not implement any workarounds since we know that the shared 
> code contains errors.
Can you give me the bug number you are talking about?
>
>> The solution fixes the problem. Test is passed.
>
>
> Yes, the test still fail after the fix, the reason will be not a lock 
> but a slow execution.
>
It is not true. I just rerun the test with updated jdk9 client codebase 
under windows and it passed in 17 seconds:

----------messages:(3/117)----------
command: main DefaultNoDrop
reason: User specified action: run main DefaultNoDrop
elapsed time (seconds): 17.339
----------System.out:(8/177)----------
invoking: onEDT1
invoking: onEDT10
invoking: onBackgroundThread20
invoking: onEDT30
invoking: onEDT40
invoking: onBackgroundThread50
invoking: onEDT60
invoking: onEDT70
----------System.err:(1/16)----------
STATUS:Passed.

>
> If you cannot explain in
>> words why it does not work for you, maybe you can provide a test case
>> which fails with the proposed solution?
>>>
>>>>>
>>>>>> I can only make guesses what specification section you mean. For
>>>>>> example, the next is waitForIdle() method's spec:
>>>>>>      /**
>>>>>>       * Waits until all events currently on the event queue have 
>>>>>> been
>>>>>> processed.
>>>>>>       * @throws  IllegalThreadStateException if called on the AWT
>>>>>> event
>>>>>> dispatching thread
>>>>>>       */
>>>>>> What does specifically contradicts to this specification?
>>>>>
>>>>> Check the documentation of the realSync method.
>>>>>
>>>> realSync is an internal method. It is not a part of the specification.
>>>> It has this description of the timeout parameter:
>>>>       * @param timeout the maximum time to wait in milliseconds,
>>>> negative means "forever".
>>>>
>>>> That is as it used in the fix. No?
>>>>>
>>>>>>>
>>>>>>>> Why to wait short wait period doesn't work
>>>>>>>> if we are in DD and there are no callbacks?
>>>>>>>
>>>>>>> "No callbacks" during what period? My point: this period should be
>>>>>>> passed to this method.
>>>>>> Period is passed to the method.
>>>>>> Java_sun_awt_windows_WToolkit_syncNativeQueue(JNIEnv *env, jobject
>>>>>> self,
>>>>>> jlong timeout)
>>>>>> It's meaning is a _maximum_ time the method should wait. Before the
>>>>>> fix
>>>>>> it was ignored and the method never returned.
>>>>>> With the fix the timeout is taken into account but this aspect of 
>>>>>> the
>>>>>> fix is not related to the original issue and it cannot fix the issue
>>>>>> alone.
>>>>>> Or you meant another method? Again, please be more specific 
>>>>>> otherwise
>>>>>> this discussion can be infinite.
>>>>>>>
>>>>>>>> No callbacks means no
>>>>>>>> activity and in analogous situations the method returns true.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I see a clear disadvantage: even with 2-3 second wait period the
>>>>>>>> test
>>>>>>>> will executes much longer than on Linux platform because D&D is
>>>>>>>> triggered every 3 mouse drags . Also other D&D tests suites will
>>>>>>>> demand
>>>>>>>> much more time to pass the windows platform. Users can report
>>>>>>>> bugs if
>>>>>>>> AWT Robot starts to produce multi-seconds delay after drag events.
>>>>>>>> Maybe you can provide an example scenario which demonstrates that
>>>>>>>> it is
>>>>>>>> really necessary to pay this price?
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Event counter is not changed during toolkit thread
>>>>>>>>>>>>>>>> blocking of
>>>>>>>>>>>>>>>> cause. Not sure that we can change that. But since toolkit
>>>>>>>>>>>>>>>> queue is
>>>>>>>>>>>>>>>> blocked we can assume that we are synced.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The timeout value is maintained on the shared level and
>>>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>> this test will fail with timeout on osx as well
>>>>>>>>>>>>>>>>> JDK-7185258.
>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>> test will fail even if the time out will be changed to
>>>>>>>>>>>>>>>>> ±100ms,
>>>>>>>>>>>>>>>>> because it call realsync on each pixel move, ±200 times.
>>>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>> problem can be fixed in different ways like tweak of
>>>>>>>>>>>>>>>>> timeouts and
>>>>>>>>>>>>>>>>> numbers of iterations, or changing the test to call 
>>>>>>>>>>>>>>>>> w4idle
>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>> after the latest move(actually I think this method can be
>>>>>>>>>>>>>>>>> moved to
>>>>>>>>>>>>>>>>> the robot class).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So I still think that the right fix for a deadlock,
>>>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>> subject of this bug, is simply change the
>>>>>>>>>>>>>>>>> syncNativeQueue and
>>>>>>>>>>>>>>>>> waits using a timeout if it is positive, and waits
>>>>>>>>>>>>>>>>> forever if
>>>>>>>>>>>>>>>>> timeout is negative (the same bug on osx JDK-8080504).
>>>>>>>>>>>>>>>> I'm not sure that waiting brings any value. What do you
>>>>>>>>>>>>>>>> propose to
>>>>>>>>>>>>>>>> return if it timed out? The event counter will not be
>>>>>>>>>>>>>>>> changed
>>>>>>>>>>>>>>>> regardless of waiting.
>>>>>>>>>>>>>>> But it should be changed, because we get native events
>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>> system during dnd and in each such callback we should 
>>>>>>>>>>>>>>> update
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> counter. If callbacks were not called=>counter was not
>>>>>>>>>>>>>>> updated
>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>> sync assume that currently we do not process events. If
>>>>>>>>>>>>>>> callbacks
>>>>>>>>>>>>>>> were called then sync assume that we have events in the
>>>>>>>>>>>>>>> native
>>>>>>>>>>>>>>> queue
>>>>>>>>>>>>>>> and should try to sync again on the next iteration.
>>>>>>>>>>>>>> No. Events are not processed while toolkit is blocked in
>>>>>>>>>>>>>> doDragDrop(). The application state is frozen on that 
>>>>>>>>>>>>>> period.
>>>>>>>>>>>>>> That is
>>>>>>>>>>>>>> Windows approach.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> With such waiting the test will fail because of either 
>>>>>>>>>>>>>>>> jtreg
>>>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>>>> Default timeout is 120 seconds for everything, the test
>>>>>>>>>>>>>>> try to
>>>>>>>>>>>>>>> sync
>>>>>>>>>>>>>>> the queue 200 of times after each move, so yes it can fail
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> timeout even if spend in nativesync 200 ms, the possible
>>>>>>>>>>>>>>> solutions
>>>>>>>>>>>>>>> were in my previous email.
>>>>>>>>>>>>>>>> either InfiniteLoop exception.
>>>>>>>>>>>>>>> This exception will be disabled by default lately in jdk9
>>>>>>>>>>>>>>> timeframe,
>>>>>>>>>>>>>>> right now it helps to find some related issues.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 03.08.15 17:26, Semyon Sadetsky wrote:
>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Please review fix for JDK9:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8132664
>>>>>>>>>>>>>>>>>>>> webrev:
>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ssadetsky/8132664/webrev.00/ 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> DoDragDrop() is blocking, so upon drag operation is
>>>>>>>>>>>>>>>>>>>> triggered
>>>>>>>>>>>>>>>>>>>> the toolkit thread is blocked and the WM_AWT_WAIT
>>>>>>>>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>>>>>>>>> processed which in its turn blocks the AWT robot.
>>>>>>>>>>>>>>>>>>>> The solution is to escape AWT robot waiting in
>>>>>>>>>>>>>>>>>>>> syncNativeQueue() if drag operation is in progress.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --Semyon
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>



More information about the awt-dev mailing list