<AWT Dev> <Awt Dev> [9] Review Request for 8132664: closed/javax/swing/DataTransfer/DefaultNoDrop/DefaultNoDrop.java locks on Windows

Semyon Sadetsky semyon.sadetsky at oracle.com
Fri Sep 18 08:01:23 UTC 2015



On 9/18/2015 3:19 AM, Sergey Bylokhov wrote:
> 08.09.2015 19:17, Semyon Sadetsky wrote:
>>
>>
>> On 9/8/2015 6:13 PM, Sergey Bylokhov wrote:
>>> On 08.09.15 18:02, Semyon Sadetsky wrote:
>>>>
>>>>
>>>> On 9/8/2015 2:06 PM, Sergey Bylokhov wrote:
>>>>> On 08.09.15 13:42, Semyon Sadetsky wrote:
>>>>>>
>>>>>>
>>>>>> On 9/8/2015 1:31 PM, Sergey Bylokhov wrote:
>>>>>>> On 08.09.15 13:15, Semyon Sadetsky wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9/8/2015 12:33 PM, Sergey Bylokhov wrote:
>>>>>>>>> On 08.09.15 12:04, Semyon Sadetsky wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 9/7/2015 6:56 PM, Sergey Bylokhov wrote:
>>>>>>>>>>> On 07.09.15 16:41, Semyon Sadetsky wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 9/7/2015 3:28 PM, Sergey Bylokhov wrote:
>>>>>>>>>>>>> On 03.09.15 17:43, Semyon Sadetsky wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 8/5/2015 2:33 PM, Sergey Bylokhov wrote:
>>>>>>>>>>>>>>> On 05.08.15 14:20, Semyon Sadetsky wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 8/5/2015 1:39 PM, Sergey Bylokhov wrote:
>>>>>>>>>>>>>>>>> On 05.08.15 13:18, Semyon Sadetsky wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 8/5/2015 12:27 PM, Sergey Bylokhov wrote:
>>>>>>>>>>>>>>>>>>> On 04.08.15 14:54, Semyon Sadetsky wrote:
>>>>>>>>>>>>>>>>>>>> On 8/3/2015 6:05 PM, Sergey Bylokhov wrote:
>>>>>>>>>>>>>>>>>>>>> Hi, Semyon
>>>>>>>>>>>>>>>>>>>>> Did you try to change dwMilliseconds from INFINITE 
>>>>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>>>>> timeout(10 seconds by default?) which is passed to 
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> method?
>>>>>>>>>>>>>>>>>>>>> It does not help? Because even when dnd is not 
>>>>>>>>>>>>>>>>>>>>> used we
>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>> not wait event for infinite time.
>>>>>>>>>>>>>>>>>>>> It would not help to fix the issue because 10 
>>>>>>>>>>>>>>>>>>>> seconds is
>>>>>>>>>>>>>>>>>>>> too big
>>>>>>>>>>>>>>>>>>>> interval. But for consistency it is not bad to have 
>>>>>>>>>>>>>>>>>>>> a time
>>>>>>>>>>>>>>>>>>>> limit.
>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ssadetsky/8132664/webrev.01/ 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Note that syncNativeQueue is intended to wait until the
>>>>>>>>>>>>>>>>>>> native
>>>>>>>>>>>>>>>>>>> event queue is flushed or until timeout will expire. So
>>>>>>>>>>>>>>>>>>> even if
>>>>>>>>>>>>>>>>>>> timeout is expired we collect the native events 
>>>>>>>>>>>>>>>>>>> during this
>>>>>>>>>>>>>>>>>>> period
>>>>>>>>>>>>>>>>>>> of time.Can you double check that the event counter is
>>>>>>>>>>>>>>>>>>> incremented
>>>>>>>>>>>>>>>>>>> during dnd? I do not know how we block the toolkit 
>>>>>>>>>>>>>>>>>>> thread,
>>>>>>>>>>>>>>>>>>> probably we create some nested loops which ignore 
>>>>>>>>>>>>>>>>>>> our event
>>>>>>>>>>>>>>>>>>> posted
>>>>>>>>>>>>>>>>>>> from syncNativeQueue, can we change that?
>>>>>>>>>>>>>>>>>> Yes, this is an internal secondary loop which waits for
>>>>>>>>>>>>>>>>>> mouse
>>>>>>>>>>>>>>>>>> release event.
>>>>>>>>>>>>>>>>> Can we change the condition and process the sync event in
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> loop?
>>>>>>>>>>>>>>>> Why? Will receive all events on the toolkit thread when
>>>>>>>>>>>>>>>> doDragDrop
>>>>>>>>>>>>>>>> returns.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When how we get dragenter/exit events?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On the platform side they are not events but callbacks 
>>>>>>>>>>>>>> which are
>>>>>>>>>>>>>> converted into events on the java side and added to the
>>>>>>>>>>>>>> queue. So
>>>>>>>>>>>>>> they
>>>>>>>>>>>>>> will be detected by isEQEmpty() and waitForIdle() will be
>>>>>>>>>>>>>> repeated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is not directly related to this fix. Most of our
>>>>>>>>>>>>> callbacks/events
>>>>>>>>>>>>> posts events to the EQ, but there is a possibility for a lag
>>>>>>>>>>>>> between
>>>>>>>>>>>>> callback and a post events, and this is why the
>>>>>>>>>>>>> syncNativeQueue was
>>>>>>>>>>>>> added.
>>>>>>>>>>>>> Let's return to the beginning: the syncNativeQueue method
>>>>>>>>>>>>> according to
>>>>>>>>>>>>> its specifications should try to flush the native system,
>>>>>>>>>>>>> track the
>>>>>>>>>>>>> native activity and should not wait more than timeout 
>>>>>>>>>>>>> parameter.
>>>>>>>>>>>>
>>>>>>>>>>>> It is not possible to send sync events to the DD loop. Only
>>>>>>>>>>>> mouse/keyboard event can return control to the app. I would 
>>>>>>>>>>>> not
>>>>>>>>>>>> emulate
>>>>>>>>>>>> them in the syncnative.
>>>>>>>>>>>> Always waiting the max time is not good. Maybe wait a bit and
>>>>>>>>>>>> check if
>>>>>>>>>>>> no callbacks happened?
>>>>>>>>>>>
>>>>>>>>>>> It will be good to check that callbacks were occurred in the
>>>>>>>>>>> same way
>>>>>>>>>>> as native events. This can be done in this fix. As for question
>>>>>>>>>>> about
>>>>>>>>>>> wait or not when timeout is over, is a shared problem, which 
>>>>>>>>>>> affect
>>>>>>>>>>> the osx also. It can be fixed separately. I guess the logic in
>>>>>>>>>>> realSync can be tweaked and the default timeout(10*4 sec) is 
>>>>>>>>>>> not a
>>>>>>>>>>> optimal as a default value, 2 or 3 seconds without any events
>>>>>>>>>>> should
>>>>>>>>>>> be enough.
>>>>>>>>>> Could you explain your logic?
>>>>>>>>>
>>>>>>>>> My logic is to follow the specification.
>>>>>>>> I cannot accept this as an answer because it is an abstract
>>>>>>>> declaration.
>>>>>>>> Please, try to be more specific.
>>>>>>>
>>>>>>> Just implement as I described it above.
>>>>>>>
>>>>>> Sorry?
>>>>>
>>>>> Just change implementation of syncNativeQueue to don't block forever
>>>>> if timeout parameter is positive. And wait forever in case of 
>>>>> negative
>>>>> timeout. This is enough to fix for now, now need to add some
>>>>> workarounds for dnd, since the realsync itself should be reworked.
>>>>>
>>>> Sergey, I'm a bit tired of this level of explanations.
>>>
>>> Explanations is simple just implement the method according 
>>> specification, since the problem is general and is not specific it 
>>> should be fixed on the shared level.
>>>
>> An internal method javadoc can be changed any time.
>
>
>
>>>> What do you mean under "for now" and why the realsync itself should be
>>>> reworked?
>>>
>>> For now means for now. There are a few fixes which were pushed to 
>>> the jdk9 in to realsync area, as well as a few fixes to the tests 
>>> (regression/sqe), and we already got a bunch of bugs. And "for now" 
>>> we should not implement any workarounds since we know that the 
>>> shared code contains errors.
>> Can you give me the bug number you are talking about?
> 8076178, 7124271, 8135004, 8003533,7185258
> And sqe related INTJDK-7620856
> Some of themalready fixed. In most of this fixes we try to remove all 
> current workarounds and see how the new implementation of 
> Robot.w8foridle works. Do not add one more in this fix, implement it 
> according to the spec.
Still did not catch. Those are particular issues on different platform 
related to the waitIdle().
The specs states wait _MAXIMUM_ time and you are insisting to wait the 
whole period.
If you are going to introduce a new generic approach for waitIdle() 
which invalidates my fix, maybe it's better
to present it in a separate jira and discuss there? It's up to you of cause.
Don't you mind if I reassign this issue to you will be able to continue 
with it on your own?
I see that you have some objective but I really failed to get the idea.
>
>>>
>>>> The solution fixes the problem. Test is passed.
>>>
>>>
>>> Yes, the test still fail after the fix, the reason will be not a 
>>> lock but a slow execution.
>>>
>> It is not true. I just rerun the test with updated jdk9 client 
>> codebase under windows and it passed in 17 seconds:
>
> If will fail with my proposal.
>
>>
>> ----------messages:(3/117)----------
>> command: main DefaultNoDrop
>> reason: User specified action: run main DefaultNoDrop
>> elapsed time (seconds): 17.339
>> ----------System.out:(8/177)----------
>> invoking: onEDT1
>> invoking: onEDT10
>> invoking: onBackgroundThread20
>> invoking: onEDT30
>> invoking: onEDT40
>> invoking: onBackgroundThread50
>> invoking: onEDT60
>> invoking: onEDT70
>> ----------System.err:(1/16)----------
>> STATUS:Passed.
>>
>>>
>>> If you cannot explain in
>>>> words why it does not work for you, maybe you can provide a test case
>>>> which fails with the proposed solution?
>>>>>
>>>>>>>
>>>>>>>> I can only make guesses what specification section you mean. For
>>>>>>>> example, the next is waitForIdle() method's spec:
>>>>>>>>      /**
>>>>>>>>       * Waits until all events currently on the event queue 
>>>>>>>> have been
>>>>>>>> processed.
>>>>>>>>       * @throws  IllegalThreadStateException if called on the AWT
>>>>>>>> event
>>>>>>>> dispatching thread
>>>>>>>>       */
>>>>>>>> What does specifically contradicts to this specification?
>>>>>>>
>>>>>>> Check the documentation of the realSync method.
>>>>>>>
>>>>>> realSync is an internal method. It is not a part of the 
>>>>>> specification.
>>>>>> It has this description of the timeout parameter:
>>>>>>       * @param timeout the maximum time to wait in milliseconds,
>>>>>> negative means "forever".
>>>>>>
>>>>>> That is as it used in the fix. No?
>>>>>>>
>>>>>>>>>
>>>>>>>>>> Why to wait short wait period doesn't work
>>>>>>>>>> if we are in DD and there are no callbacks?
>>>>>>>>>
>>>>>>>>> "No callbacks" during what period? My point: this period 
>>>>>>>>> should be
>>>>>>>>> passed to this method.
>>>>>>>> Period is passed to the method.
>>>>>>>> Java_sun_awt_windows_WToolkit_syncNativeQueue(JNIEnv *env, jobject
>>>>>>>> self,
>>>>>>>> jlong timeout)
>>>>>>>> It's meaning is a _maximum_ time the method should wait. Before 
>>>>>>>> the
>>>>>>>> fix
>>>>>>>> it was ignored and the method never returned.
>>>>>>>> With the fix the timeout is taken into account but this aspect 
>>>>>>>> of the
>>>>>>>> fix is not related to the original issue and it cannot fix the 
>>>>>>>> issue
>>>>>>>> alone.
>>>>>>>> Or you meant another method? Again, please be more specific 
>>>>>>>> otherwise
>>>>>>>> this discussion can be infinite.
>>>>>>>>>
>>>>>>>>>> No callbacks means no
>>>>>>>>>> activity and in analogous situations the method returns true.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I see a clear disadvantage: even with 2-3 second wait period the
>>>>>>>>>> test
>>>>>>>>>> will executes much longer than on Linux platform because D&D is
>>>>>>>>>> triggered every 3 mouse drags . Also other D&D tests suites will
>>>>>>>>>> demand
>>>>>>>>>> much more time to pass the windows platform. Users can report
>>>>>>>>>> bugs if
>>>>>>>>>> AWT Robot starts to produce multi-seconds delay after drag 
>>>>>>>>>> events.
>>>>>>>>>> Maybe you can provide an example scenario which demonstrates 
>>>>>>>>>> that
>>>>>>>>>> it is
>>>>>>>>>> really necessary to pay this price?
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Event counter is not changed during toolkit thread
>>>>>>>>>>>>>>>>>> blocking of
>>>>>>>>>>>>>>>>>> cause. Not sure that we can change that. But since 
>>>>>>>>>>>>>>>>>> toolkit
>>>>>>>>>>>>>>>>>> queue is
>>>>>>>>>>>>>>>>>> blocked we can assume that we are synced.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The timeout value is maintained on the shared level and
>>>>>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>>>> this test will fail with timeout on osx as well
>>>>>>>>>>>>>>>>>>> JDK-7185258.
>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>> test will fail even if the time out will be changed to
>>>>>>>>>>>>>>>>>>> ±100ms,
>>>>>>>>>>>>>>>>>>> because it call realsync on each pixel move, ±200 
>>>>>>>>>>>>>>>>>>> times.
>>>>>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>>>> problem can be fixed in different ways like tweak of
>>>>>>>>>>>>>>>>>>> timeouts and
>>>>>>>>>>>>>>>>>>> numbers of iterations, or changing the test to call 
>>>>>>>>>>>>>>>>>>> w4idle
>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>> after the latest move(actually I think this method 
>>>>>>>>>>>>>>>>>>> can be
>>>>>>>>>>>>>>>>>>> moved to
>>>>>>>>>>>>>>>>>>> the robot class).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So I still think that the right fix for a deadlock,
>>>>>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>>>> subject of this bug, is simply change the
>>>>>>>>>>>>>>>>>>> syncNativeQueue and
>>>>>>>>>>>>>>>>>>> waits using a timeout if it is positive, and waits
>>>>>>>>>>>>>>>>>>> forever if
>>>>>>>>>>>>>>>>>>> timeout is negative (the same bug on osx JDK-8080504).
>>>>>>>>>>>>>>>>>> I'm not sure that waiting brings any value. What do you
>>>>>>>>>>>>>>>>>> propose to
>>>>>>>>>>>>>>>>>> return if it timed out? The event counter will not be
>>>>>>>>>>>>>>>>>> changed
>>>>>>>>>>>>>>>>>> regardless of waiting.
>>>>>>>>>>>>>>>>> But it should be changed, because we get native events
>>>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>>>> system during dnd and in each such callback we should 
>>>>>>>>>>>>>>>>> update
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> counter. If callbacks were not called=>counter was not
>>>>>>>>>>>>>>>>> updated
>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>> sync assume that currently we do not process events. If
>>>>>>>>>>>>>>>>> callbacks
>>>>>>>>>>>>>>>>> were called then sync assume that we have events in the
>>>>>>>>>>>>>>>>> native
>>>>>>>>>>>>>>>>> queue
>>>>>>>>>>>>>>>>> and should try to sync again on the next iteration.
>>>>>>>>>>>>>>>> No. Events are not processed while toolkit is blocked in
>>>>>>>>>>>>>>>> doDragDrop(). The application state is frozen on that 
>>>>>>>>>>>>>>>> period.
>>>>>>>>>>>>>>>> That is
>>>>>>>>>>>>>>>> Windows approach.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> With such waiting the test will fail because of 
>>>>>>>>>>>>>>>>>> either jtreg
>>>>>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>>>>>> Default timeout is 120 seconds for everything, the test
>>>>>>>>>>>>>>>>> try to
>>>>>>>>>>>>>>>>> sync
>>>>>>>>>>>>>>>>> the queue 200 of times after each move, so yes it can 
>>>>>>>>>>>>>>>>> fail
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> timeout even if spend in nativesync 200 ms, the possible
>>>>>>>>>>>>>>>>> solutions
>>>>>>>>>>>>>>>>> were in my previous email.
>>>>>>>>>>>>>>>>>> either InfiniteLoop exception.
>>>>>>>>>>>>>>>>> This exception will be disabled by default lately in jdk9
>>>>>>>>>>>>>>>>> timeframe,
>>>>>>>>>>>>>>>>> right now it helps to find some related issues.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 03.08.15 17:26, Semyon Sadetsky wrote:
>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Please review fix for JDK9:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> bug: 
>>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8132664
>>>>>>>>>>>>>>>>>>>>>> webrev:
>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ssadetsky/8132664/webrev.00/ 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> DoDragDrop() is blocking, so upon drag operation is
>>>>>>>>>>>>>>>>>>>>>> triggered
>>>>>>>>>>>>>>>>>>>>>> the toolkit thread is blocked and the WM_AWT_WAIT
>>>>>>>>>>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>>>>>>>>>>> processed which in its turn blocks the AWT robot.
>>>>>>>>>>>>>>>>>>>>>> The solution is to escape AWT robot waiting in
>>>>>>>>>>>>>>>>>>>>>> syncNativeQueue() if drag operation is in progress.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --Semyon
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>



More information about the awt-dev mailing list