<AWT Dev> deadlock between AppKit main thread and AWT event thread in JDK 17

Alan Snyder fishgarage at cbfiddle.com
Fri Jul 9 01:17:16 UTC 2021



> On Jul 8, 2021, at 5:18 PM, Philip Race <philip.race at oracle.com> wrote:
> 
> There has never been any contract that JDK would use JNF or specify a value for a run loop.
> I'd be interested to know where this was documented. I can't even find any Apple docs.
> There's no way we would ever consider something like this name a supported interface without lots
> of thought and consideration of the implications.
> 

Well, it sounds like you are splitting hairs. When Oracle took over the Apple JDK, they acquired the API contracts of that JDK, morally, if not legally.

The original implementation of JDK on macOS was created by Apple.
They created JavaNativeFoundation which was used by their JDK *and* provided to application developers writing JNI libraries.
The run loop name was not exposed to application developers, it was hidden in the implementation of [JNFRunLoop performOnMainThreadWaiting].
It worked because it used the same run loop mode as ThreadUtilities, which was used in the JDK native code.
As Mike Swingler said, JNFRunLoop was created for this purpose.

Link: https://wiki.openjdk.java.net/display/MacOSXPort/Java+vs.+AppKit+Threading+Manifesto

I found this document when investigating my deadlock. It wasn’t hard to find.

It is unfortunate that you decided to mess with code you did not understand.

JNF is not going away. It has been converted to an open source unbundled framework.
I am told it is being bundled with some arm64 implementations of JDK because the JavaNativeFoundation in macOS 11 contains only x86_64 code.

All of my JNI libraries that interact with AppKit use JNF. It is hard to imagine a macOS-specific JNI library that doesn’t use JNF.

I have no intention of rewriting my code at this time.
If OpenJDK provides its own JNI support for macOS, I would use it (if necessary to work with the latest JDK).

If I should publish my code for using native file dialogs, I bet it would be used, and all of the developers using it would eventually get reports of deadlock on JDK 17 as it now stands, as all it takes is the user typing a keyboard shortcut. In any case, you are taking a big chance by hoping that no one else will be affected.

Is there a process for submitting a CSR when an incompatibility is discovered after a change has been integrated?

Yes, I use the standard API for creating an AWT secondary run loop.

  Alan





> It is clear that you are interested only in changing the value but I still need to understand the use case better
> since even reverting it is implying some level of contract which does not exist.
> I suppose you *must* somewhere be using JNF directly but it is not clear where.
> And since JNF is going away you are going to have to re-write this code to use something else besides use JNF.
> And whereas I presume today you are getting the value from JNF you'll have to hard-code it in the future
> So there won't be any more "at run time get the string from JNF". Everyone would need to know it which
> makes it a de facto API which makes me very uncomfortable. But I've not heard anyone else who is
> "broken" by this change so no sarcasm meant, it could mean no one else is affected.
> 
> So a test case might have helped more than the words.
> 
> "My Java code running on the AWT thread starts a secondary Java run loop"
> 
> I presume you mean using standard API? Doing something like calling
> https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/EventQueue.html#createSecondaryLoop() ?
> 
> -phil.
> 
> On 7/8/21 4:02 PM, Alan Snyder wrote:
>> The only thing that needs to be part of the API is the run loop mode *name*. If for some reason in the future the JDK stops using a native run loop when the AppKit main thread calls Java, there is no harm if a native (JNI) library calls performOnMainThread and passes that name as one of the run loop modes. The obsolete run loop mode name will have no effect if no native run loop uses that name.
>> 
>> A problem arises only if the run loop mode name is *changed* and the run loop mode is still important, which is the case now.
>> 
>> I don’t think it matters how rare the problem is. There was no reason to change the name of the run loop mode, so even *one* new deadlock is a regression. I have two examples, but only one is repeatable. That doesn’t mean the other one isn’t important.
>> 
>> To be clear about my repeatable example, my Java code running on the AWT thread starts a secondary Java run loop because it wants to block on the result of the native file dialog. AppKit is calling Java because the native file dialog contains an accessory view that is implemented using a Swing component. This works fine before your change.
>> 
>> Although this example deadlocks reliably, I don’t see how it would help you. I think the problem is obvious. Using JavaNativeFoundation to perform code on the main thread (which is how third party JNI libraries have been told to do it), the old run loop mode name is used, so the code is not performed while the AppKit main thread is blocked in a JDK-implemented native run loop. If the JNI library was called on the AWT thread and the AppKit main thread is waiting for its code to be performed on the AWT thread, deadlock results.
>> 
>> The second example uses a native file dialog (no accessory) and deadlocks when a key shortcut is used causing AppKit to inquire about the application menus. It is more timing sensitive and has happened only once.
>> 
>>>> Changing the name seems to have at least been useful to find cases such as this which I suspect are very, very rare.
>> 
>> I trust you are being sarcastic. I have yet to see a CSR that says an incompatible change is being made solely to find out what breaks.
>> 
>>   Alan
>> 
>> 
>> 
>>> On Jul 8, 2021, at 2:39 PM, Philip Race <philip.race at oracle.com> wrote:
>>> 
>>> 
>>> Changing the name seems to have at least been useful to find cases such as this which I suspect are very, very rare.
>>> 
>>>> My Java code sets up a secondary run loop.
>>> But JDK only enters that mode if *it* creates a secondary run loop.
>>> 
>>>>  The AppKit implementation of the file dialog calls Java to get accessibility information.
>>> Why would Appkit call Java to ask about A11y info for a platform native dialog ?
>>> 
>>> I suppose the situation isn't as clear to me as it should be.
>>> 
>>> I might be asking a lot but is there a test case you can provide ?
>>> 
>>> As to making something like this part of a "public API" it seems to me that Java setting up this
>>> mode in a 2ndary run loop in some situations is really an implementation choice and I wouldn't
>>> know where to expose it even if it were something appropriate to do.
>>> I'd sooner find a way to dispense with it entirely.
>>> In fact there is some provision when doing FX interop to not use this mode at all.
>>> 
>>> -phil.
>>> 
>>> 
>>> On 6/28/21 5:51 AM, Alan Snyder wrote:
>>>> Hmm… it appears that in removing JavaNativeFoundation from the JDK the name of the run loop was changed from AWTRunLoopMode to javaRunLoopMode.
>>>> 
>>>> If that is correct, it is an incompatible change that breaks third party use of JavaNativeFoundation for running code on the main thread.
>>>> 
>>>> It also sounds like a gratuitous change.
>>>> 
>>>> 
>>>> 
>>>>> On Jun 27, 2021, at 10:24 AM, Alan Snyder <javalists at cbfiddle.com> wrote:
>>>>> 
>>>>> I have a program that reliably deadlocks when run on JDK 17, but not on JDK 16 (although that may be due to timing differences, so it may not imply a new bug).
>>>>> 
>>>>> It’s a fairly complicated situation.
>>>>> 
>>>>> On the AWT thread, my program calls native code that displays a native file dialog. My Java code sets up a secondary run loop. The native code blocks on JNFRunLoop performOnMainThread to create the native file dialog.
>>>>> 
>>>>> The AppKit implementation of the file dialog calls Java to get accessibility information. This sets up a run loop on the main thread and upcalls to Java. I presume this posts an AWT event.
>>>>> 
>>>>> Before the AWT secondary run loop can process the request for accessibility information, it runs an invocation event (previously posted by a timer) that calls native code. This native code blocks attempting to perform code on the main thread using JNFRunLoop. Apparently, this request is never processed and the AWT thread remains blocked forever.
>>>>> 
>>>>> If I change this latter native code to perform the main thread operation without blocking, there is no deadlock and all is fine.
>>>>> 
>>>>> But I have encountered other deadlocks (not reliably repeatable) where this option is not available. Therefore, I would like to understand why this deadlock is happening.
>>>>> 
>>>>> With run loops on both threads, what would cause the deadlock?
>>>>> 
>>>>> [Question: does JDK 17 and JNFRunLoop use the same NSString to identify the java run loop mode? If not, might that matter?]
>>>>> 
>>>>> I would appreciate any suggestions of what might be going wrong or how to track it down.
>>>>> 
>>>>>  Alan
>>>>> 
> 



More information about the awt-dev mailing list