[External] : Re: Detecting threading problems faster

Andy Goryachev andy.goryachev at oracle.com
Wed Aug 7 22:12:59 UTC 2024


> > Looks like the issue in PR#1123 was caused by application code accessing FX from a non-FX thread.
> That does seem likely, but it is incredibly hard to be sure.
There is literally an exception logged from a non-FX thread, see here
https://github.com/openjdk/jfx/pull/1123#issuecomment-1535930576

:-)

But you do have a point in trying to save developers' time.  And, if we decide to hide the check behind a system property not enabled by default, we also avoid any risk of regression.

So the question is reduced to where we should put the check.  One such spot could be Parent.requestLayout(), since pretty much any changes to the scene graph result in a layout request.

Another possibility we can consider is to create a special convenience class for background processing, along the lines of SwingWorker.  For example, it could extend Runnable, have utility methods to runAndWait(), and launch itself in a Thread or (later) in a virtual thread.  This way the app devs can stop worrying and simply move any kind of background processing to the instances of these classes.  Just a thought.

-andy


From: John Hendrikx <john.hendrikx at gmail.com>
Date: Wednesday, August 7, 2024 at 14:53
To: Andy Goryachev <andy.goryachev at oracle.com>, openjfx-dev at openjdk.org <openjfx-dev at openjdk.org>
Subject: Re: [External] : Re: Detecting threading problems faster

On 07/08/2024 21:38, Andy Goryachev wrote:
John:

Looks like the issue in PR#1123 was caused by application code accessing FX from a non-FX thread.
That does seem likely, but it is incredibly hard to be sure.


I know the discussion is about trying to detect thread misuse early on, as a way to help the application developers.  I vaguely recall the discussion we had at some point about adding the check to property invalidation code or some such, and the general consensus was that it might pose a compatibility risk or something along these lines.

Perhaps there is a place where such a check might be safely made (I don't know of any such place though)?   We did, after all, step on the path of simplification (?) for the sake of students (see https://openjdk.org/jeps/445 ).  Perhaps such a check could be hidden behind a system property so as not to break existing, if poorly written, applications.

I agree that we should be careful not to break anything, it is a check, but I also think that most developers will want this check on (but probably not by default initially, but definitely **after** they fixed any mistakes it detected to avoid recurrence). There is nothing worse than code that only rarely and inexplicably fails on other machines.

For me, I wouldn't hesitate for a moment to turn checks on that alert me of incorrect use of FX -- I value my time, and bugs resulting from these threading problems can be very costly to track down. I already wrote my own check that scans for dead "nodes" (ie. nodes that were once part of a scene graph, but are no longer, but are still not being GC'd).

Personally, I feel that it should be the first thing taught - always call FX from the FX application thread.
Even Java collections will help you avoid threading problems by throwing ConcurrentModificationException.  Imagine a moment a world where this check did not exist, as it does cost some performance (a mod counter is increased on every collection modification):

- Concurrent access to things like HashMap and ArrayList would seem to work fine
- Sometimes randomly, an item goes missing, is overwritten or duplicated, or the code gets stuck in an infinite loop (these are all possible outcomes of concurrent collection access, as the check isn't perfect, so when you do see it, you must investigate thoroughly)
- The bug is reported (collection framework is buggy!)
- After analysis, culprit is that the user somewhere in their 100.000+ line code base is accessing a collection on different threads... sorry, we can't help you further.
- Except, they could help us further: ConcurrentModificationException was included at a slight cost of performance. It can't even be disabled (I'm not implying this was a later addition, the designers were just aware that this would lead to hard to track bugs and added this immediately).

The parallels with the FX threading model, and also with weak references are huge:

- Both "seem" to work fine, until they don't
- Failures are very random, may not occur at all on some setups (like the developer's), and failures can be subtle, deceiving or complete
- The bug is reported (FX doesn't even check for nulls in Scene code!)
- After analysis, culprit is (likely) that the user somewhere in their 100.000+ line code base is accessing the scene graph on different threads / forgot to maintain a hard reference... sorry, we can't help you further.

These kinds of mistakes are not just happening to beginners.  This can happen when code gets rearranged, its original operating parameters long forgotten by you, or some future maintainer.  All it takes is a moment of inattention and you have a bug that you may never personally encounter, and when it does occur is impossible to reproduce with any certainty and does NOT give any indication of the cause (you see only symptoms).  Such bugs are the nastiest kind, and it is (IMHO) a failure of a framework to not help you detect these as early as possible.

--John



I am curious what other members of community think of this issue.

-andy




From: John Hendrikx <john.hendrikx at gmail.com><mailto:john.hendrikx at gmail.com>
Date: Wednesday, August 7, 2024 at 12:01
To: Andy Goryachev <andy.goryachev at oracle.com><mailto:andy.goryachev at oracle.com>, openjfx-dev at openjdk.org<mailto:openjfx-dev at openjdk.org> <openjfx-dev at openjdk.org><mailto:openjfx-dev at openjdk.org>
Subject: [External] : Re: Detecting threading problems faster

Hi Andy,

Problems that occur due to modifying the scene graph (like adding/removing children, or changing certain properties) can be incredibly subtle, and almost never point out the code that is the actual culprit.  I suspect that the PR below is one of those for example, but there are likely many more.

https://github.com/openjdk/jfx/pull/1123<https://urldefense.com/v3/__https:/github.com/openjdk/jfx/pull/1123__;!!ACWV5N9M2RV99hQ!ILqOOB9ua4tDxfAV9pGPJJiQ3oXewiSQjH2BCsaQkhpgkHEQzT8hRQt2fb9mKjYbF24ZRpQwfEiELFvn4B28lBLwxrTE$>

Issues that complain about some kind of odd bug in the FX code (like a NPE, ConcurrentModificationException, ArrayIndexOutOfBoundsException, etc) could all be caused by user code doing things on the wrong thread, but its really hard to diagnose as the stack traces provided will never point out the user code that may be the culprit.  I've investigated PR 1123 extensively, and I can't see any bugs in the FX code at all **assuming** it is used from a single thread...

--John
On 05/08/2024 17:01, Andy Goryachev wrote:
John:

Can you cite a bug or give an example of such a problem?

During all my time of actively using javafx (since about 2015) I have never encountered an issue with threading you describe.  It is possible that I use the system trivially and not to the full extent.  Both swing and javafx are single threaded by design.  Yes, there might be occasions when one can use multiple threads and it is sort of allowed, but doing so may lead to unpleasant issues down the road, so the question is why would you want to do that?

-andy

From: openjfx-dev <openjfx-dev-retn at openjdk.org><mailto:openjfx-dev-retn at openjdk.org> on behalf of John Hendrikx <john.hendrikx at gmail.com><mailto:john.hendrikx at gmail.com>
Date: Sunday, August 4, 2024 at 16:30
To: openjfx-dev at openjdk.org<mailto:openjfx-dev at openjdk.org> <openjfx-dev at openjdk.org><mailto:openjfx-dev at openjdk.org>
Subject: Detecting threading problems faster
Hi list,

I know of quite some bugs and users that have been bitten by the
threading model used by JavaFX.  Basically, anything directly or
indirectly linked to an active Scene must be accessed on the FX thread.
However, as FX also allows manipulating nodes and properties before
they're displayed, there can be no "hard" check everywhere to ensure we
are on the FX thread (specifically, in properties).

Now, I think this situation is annoying, as a simple mistake where a
Platform.runLater wrapper was forgotten usually results in programs
operating mostly flawlessly, but then fail in mysterious and random and
hard to reproduce ways.  The blame is often put on FX as the resulting
exceptions will almost never show the user code which was the actual
culprit.  It can result in FX being perceived as unstable or buggy.

So I've been thinking if there isn't something we can do to detect these
bugs originating from user code much earlier, similar to the
`ConcurrentModificationException` the collection classes do when
accessed in nested or concurrent contexts.

I think it may be possible to have properties check whether they're part
of an active scene without too much of an performance impact, possibly
even behind a switch. It would work like this:

Properties involved with Nodes will have an associated bean instance
(`getBean`).  This is an object, but we could check here if this
instance implements an interface:

      if (getBean() instanceof MayBePartOfSceneGraph x) {
            if (x.isPartOfActiveScene() && !isOnFxThread()) {
                 throw new IllegalStateException("Property must only be
used from the FX Application Thread");
            }
      }

This check could be done on every set of the property, and potentially
on every get as well.  It should be relatively cheap, but will expose
problematic code patterns at a much earlier stage.  There's a chance
that this will "break" some programs that seemed to be behaving
correctly as well, so we may want to put it behind a switch until such
programs (or libraries) can be fixed.

What do you all think?

--John

(*) Names of methods/interfaces are only used for illustration purposes,
we can think of good names if this moves forward.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/openjfx-dev/attachments/20240807/d3cb3fb8/attachment-0001.htm>


More information about the openjfx-dev mailing list