Accidentally reproduced NPE in synchronizeNodes in combination with enterNestedEventLoop

Fri Aug 23 02:31:37 UTC 2024

That seems to be a tough one.

Delaying the invocation of listeners sounds interesting, as it might
allow using a pattern like the following:

    childrenTriggerPermutation = true;

    try (var scope = new DelayedEventScope(children)) {
        children.remove(node);
        children.add(node);
    } finally {
        childrenTriggerPermutation = false;
    }

The semantics would be that the property implementation will still
receive notifications with their invalidated() method as the property
is being modified, but events will only be fired at the end of the
scope.
List properties will need a new listChanged() method to allow for the
same pattern of overriding the method instead of adding a change
listener.

Of course, the implementation will be challenging. We'd need to keep
track of all modifications, and then aggregate those modifications
into a single event. In this particular example, the two "add" and
"remove" events would probably be consolidated into a "permutation"
event.

In general, delayed notification scopes for properties could also be
very useful for application developers.

On Thu, Aug 22, 2024 at 9:59 AM John Hendrikx <john.hendrikx at gmail.com> wrote:
>
> I think I figured out the reason why this fails.  The root cause lies in a misconception I've seen in a lot of FX code.
>
> JavaFX uses a single event thread model, which ensures all structures are only ever accessed by a single thread.  This frees FX from having to do synchronization on almost every modification you make to properties or the scene graph.
>
> However, in many areas it makes the assumption that such code will always run sequentially to completion without interruption, and uses instance fields judiciously to communicate things to deeper nested code or to code further down the line.  But code using instance fields in this way is not safe to re-enter (it is not reentrant-safe) without precautions -- sharing instance fields in this way safely can easily get as complicated as writing multi-threaded code.
>
> A simple example that I saw in Parent's toFront code:
>
> childrenTriggerPermutation = true;
>
> try {
>
> children.remove(node);
>
> children.add(node);
>
> } finally {
>
> childrenTriggerPermutation = false;
>
> }
>
> The above code uses an instance field "childrenTriggerPermutation" to activate an optimization. The optimization will assume that the children are only re-arranged, and no new ones were added or removed.  However, "children" is an ObservableList, which means the user can register listeners on it, which do who knows what.  If such a listener modifies the children list in another way then the code is entered again, but the "childrenTriggerPermutation" optimization will still be enabled causing it to not notice the change the user did.
>
> This problem is similar to the ChangeListener old value bug.  When within a change listener you do another change (and so the same code is called **deeper** in the same stack), downstream change listeners will not receive the correct old values because the code is insufficiently reentrant-safe.  ExpressionHelper **tries** to mitigate some of these issues (for cases where listeners are added/removed reentrantly) by making copies of the listener list, but it does not handle this case.
>
> Similarly, the bug I encountered in my original post is also such an issue.  While processing the children list changes, several **properties** are being manipulated.  Being properties, these can have listeners of their own that could trigger further modifications and, in complex enough programs, they may even re-enter the same class's code that is sharing instance fields in an unsafe way.  And that's exactly what is happening:
>
> 1. The children list change processing is registering the offset of the first changed child in the children list (called "startIdx") as an instance field -- this field is used as an optimization for updatePeer (so it doesn't have to check/copy all children).  It assumes the processing always finishes completely and it will get to the point where it sets "startIdx" but...
>
> 2. Before it sets "startIdx" but after the children list is already modified, it modifies several properties.  Being properties, these can have listeners, and as such this can trigger a cascade of further calls in complicated applications.
>
> 3. In this case, the cascade of calls included an "enterNestedEventLoop".  Pulses (and things like Platform#runLater) can be handled on such a nested loop, and FX decides that now is as good a time as any to handle a new pulse.
>
> 4. The pulse triggers updatePeer calls, among which is the Parent that is still (higher in the stack) midway its children list processing code.
>
> 5. The updatePeer code looks at "startIdx", the shared instance field that Parent uses for its optimizations.  This field is NOT modified yet.  The field indicates the first child that was modified, and the field is normally set to "children.size()" when there are no changes.  That's also the case in this case still, and so updatePeer updates nothing at all.  An assertion later in this code then checks if children.size() == peer.children.size() which fails... a stack trace is thrown, and synchronizeSceneNodes() blows up with infinite NPE's.
>
> I'm not entirely sure yet how to resolve this, and if it should be.
>
> Perhaps the safest way would be to undo some of the optimizations/assumptions, and perhaps reoptimize them if there's a pressing need.
>
> Another option would be to somehow delay listener callbacks until the code in Parent is in a safe state.
>
> The option I like the least is to introduce yet another instance flag ("processingListChange") and throwing an early exception if other code is entered that doesn't expect it...
>
> --John