[External] : Re: Detecting threading problems faster

John Hendrikx john.hendrikx at gmail.com
Wed Aug 7 21:53:04 UTC 2024


On 07/08/2024 21:38, Andy Goryachev wrote:
>
> John:
>
> Looks like the issue in PR#1123 was caused by application code 
> accessing FX from a non-FX thread.
>
That does seem likely, but it is incredibly hard to be sure.
>
> I know the discussion is about trying to detect thread misuse early 
> on, as a way to help the application developers.  I vaguely recall the 
> discussion we had at some point about adding the check to property 
> invalidation code or some such, and the general consensus was that it 
> might pose a compatibility risk or something along these lines.
>
> Perhaps there is a place where such a check might be safely made (I 
> don't know of any such place though)?   We did, after all, step on the 
> path of simplification (?) for the sake of students (see 
> https://openjdk.org/jeps/445 ).  Perhaps such a check could be hidden 
> behind a system property so as not to break existing, if poorly 
> written, applications.
>
I agree that we should be careful not to break anything, it is a check, 
but I also think that most developers will want this check on (but 
probably not by default initially, but definitely **after** they fixed 
any mistakes it detected to avoid recurrence). There is nothing worse 
than code that only rarely and inexplicably fails on other machines.

For me, I wouldn't hesitate for a moment to turn checks on that alert me 
of incorrect use of FX -- I value my time, and bugs resulting from these 
threading problems can be very costly to track down. I already wrote my 
own check that scans for dead "nodes" (ie. nodes that were once part of 
a scene graph, but are no longer, but are still not being GC'd).

> Personally, I feel that it should be the first thing taught - always 
> call FX from the FX application thread.
>
Even Java collections will help you avoid threading problems by throwing 
ConcurrentModificationException.  Imagine a moment a world where this 
check did not exist, as it does cost some performance (a mod counter is 
increased on every collection modification):

- Concurrent access to things like HashMap and ArrayList would seem to 
work fine
- Sometimes randomly, an item goes missing, is overwritten or 
duplicated, or the code gets stuck in an infinite loop (these are all 
possible outcomes of concurrent collection access, as the check isn't 
perfect, so when you do see it, you must investigate thoroughly)
- The bug is reported (collection framework is buggy!)
- After analysis, culprit is that the user somewhere in their 100.000+ 
line code base is accessing a collection on different threads... sorry, 
we can't help you further.
- Except, they could help us further: ConcurrentModificationException 
was included at a slight cost of performance. It can't even be disabled 
(I'm not implying this was a later addition, the designers were just 
aware that this would lead to hard to track bugs and added this 
immediately).

The parallels with the FX threading model, and also with weak references 
are huge:

- Both "seem" to work fine, until they don't
- Failures are very random, may not occur at all on some setups (like 
the developer's), and failures can be subtle, deceiving or complete
- The bug is reported (FX doesn't even check for nulls in Scene code!)
- After analysis, culprit is (likely) that the user somewhere in their 
100.000+ line code base is accessing the scene graph on different 
threads / forgot to maintain a hard reference... sorry, we can't help 
you further.

These kinds of mistakes are not just happening to beginners. This can 
happen when code gets rearranged, its original operating parameters long 
forgotten by you, or some future maintainer.  All it takes is a moment 
of inattention and you have a bug that you may never personally 
encounter, and when it does occur is impossible to reproduce with any 
certainty and does NOT give any indication of the cause (you see only 
symptoms).  Such bugs are the nastiest kind, and it is (IMHO) a failure 
of a framework to not help you detect these as early as possible.

--John


> I am curious what other members of community think of this issue.
>
> -andy
>
> *From: *John Hendrikx <john.hendrikx at gmail.com>
> *Date: *Wednesday, August 7, 2024 at 12:01
> *To: *Andy Goryachev <andy.goryachev at oracle.com>, 
> openjfx-dev at openjdk.org <openjfx-dev at openjdk.org>
> *Subject: *[External] : Re: Detecting threading problems faster
>
> Hi Andy,
>
> Problems that occur due to modifying the scene graph (like 
> adding/removing children, or changing certain properties) can be 
> incredibly subtle, and almost never point out the code that is the 
> actual culprit.  I suspect that the PR below is one of those for 
> example, but there are likely many more.
>
> https://github.com/openjdk/jfx/pull/1123 
> <https://urldefense.com/v3/__https:/github.com/openjdk/jfx/pull/1123__;!!ACWV5N9M2RV99hQ!ILqOOB9ua4tDxfAV9pGPJJiQ3oXewiSQjH2BCsaQkhpgkHEQzT8hRQt2fb9mKjYbF24ZRpQwfEiELFvn4B28lBLwxrTE$>
>
> Issues that complain about some kind of odd bug in the FX code (like a 
> NPE, ConcurrentModificationException, ArrayIndexOutOfBoundsException, 
> etc) could all be caused by user code doing things on the wrong 
> thread, but its really hard to diagnose as the stack traces provided 
> will never point out the user code that may be the culprit. I've 
> investigated PR 1123 extensively, and I can't see any bugs in the FX 
> code at all **assuming** it is used from a single thread...
>
> --John
>
> On 05/08/2024 17:01, Andy Goryachev wrote:
>
>     John:
>
>     Can you cite a bug or give an example of such a problem?
>
>     During all my time of actively using javafx (since about 2015) I
>     have never encountered an issue with threading you describe.  It
>     is possible that I use the system trivially and not to the full
>     extent. Both swing and javafx are single threaded by design.  Yes,
>     there might be occasions when one can use multiple threads and it
>     is sort of allowed, but doing so may lead to unpleasant issues
>     down the road, so the question is why would you want to do that?
>
>     -andy
>
>     *From: *openjfx-dev <openjfx-dev-retn at openjdk.org>
>     <mailto:openjfx-dev-retn at openjdk.org> on behalf of John Hendrikx
>     <john.hendrikx at gmail.com> <mailto:john.hendrikx at gmail.com>
>     *Date: *Sunday, August 4, 2024 at 16:30
>     *To: *openjfx-dev at openjdk.org <openjfx-dev at openjdk.org>
>     <mailto:openjfx-dev at openjdk.org>
>     *Subject: *Detecting threading problems faster
>
>     Hi list,
>
>     I know of quite some bugs and users that have been bitten by the
>     threading model used by JavaFX.  Basically, anything directly or
>     indirectly linked to an active Scene must be accessed on the FX
>     thread.
>     However, as FX also allows manipulating nodes and properties before
>     they're displayed, there can be no "hard" check everywhere to
>     ensure we
>     are on the FX thread (specifically, in properties).
>
>     Now, I think this situation is annoying, as a simple mistake where a
>     Platform.runLater wrapper was forgotten usually results in programs
>     operating mostly flawlessly, but then fail in mysterious and
>     random and
>     hard to reproduce ways.  The blame is often put on FX as the
>     resulting
>     exceptions will almost never show the user code which was the actual
>     culprit.  It can result in FX being perceived as unstable or buggy.
>
>     So I've been thinking if there isn't something we can do to detect
>     these
>     bugs originating from user code much earlier, similar to the
>     `ConcurrentModificationException` the collection classes do when
>     accessed in nested or concurrent contexts.
>
>     I think it may be possible to have properties check whether
>     they're part
>     of an active scene without too much of an performance impact,
>     possibly
>     even behind a switch. It would work like this:
>
>     Properties involved with Nodes will have an associated bean instance
>     (`getBean`).  This is an object, but we could check here if this
>     instance implements an interface:
>
>           if (getBean() instanceof MayBePartOfSceneGraph x) {
>                 if (x.isPartOfActiveScene() && !isOnFxThread()) {
>                      throw new IllegalStateException("Property must
>     only be
>     used from the FX Application Thread");
>                 }
>           }
>
>     This check could be done on every set of the property, and
>     potentially
>     on every get as well.  It should be relatively cheap, but will expose
>     problematic code patterns at a much earlier stage.  There's a chance
>     that this will "break" some programs that seemed to be behaving
>     correctly as well, so we may want to put it behind a switch until
>     such
>     programs (or libraries) can be fixed.
>
>     What do you all think?
>
>     --John
>
>     (*) Names of methods/interfaces are only used for illustration
>     purposes,
>     we can think of good names if this moves forward.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/openjfx-dev/attachments/20240807/f3ef4dc7/attachment-0001.htm>


More information about the openjfx-dev mailing list