[External] : Re: JVM crashes on macOS when entering too many nested event loops
Christopher Schnick
crschnick at xpipe.io
Thu Apr 3 03:57:40 UTC 2025
Looking at the related issues in the bug tracker, it seems like one
common cause are modifications on a non-platform thread. I hooked up our
application to a detection mechanism, but didn't get any meaningful hits
so far. Maybe I will find something in the future with this.
For reference, is there any already existing best practice to detect
these non-platform thread modifications? I just wrote my own, feel free
to check if I'm missing anything:
public class NodeCallback {
private static final Set<Window>windows =new HashSet<>();
private static final Set<Node>nodes =new HashSet<>();
public static void init() {
if (!AppProperties.get().isDebugPlatformThreadAccess()) {
return;
}
Window.getWindows().addListener((ListChangeListener<?super Window>) change -> {
for (Window window : change.getList()) {
if (!windows.add(window)) {
continue;
}
window.sceneProperty().subscribe(scene -> {
if (scene ==null) {
return;
}
scene.rootProperty().subscribe(root -> {
if (root !=null) {
watchPlatformThreadChanges(root);
}
});
});
}
});
}
private static void watchPlatformThreadChanges(Node node) {
watchGraph(node, c -> {
if (!nodes.add(c)) {
return;
}
if (cinstanceof Parent p) {
p.getChildrenUnmodifiable().addListener((ListChangeListener<?super Node>) change -> {
checkPlatformThread();
});
}
c.visibleProperty().addListener((observable, oldValue, newValue) -> {
checkPlatformThread();
});
c.boundsInParentProperty().addListener((observable, oldValue, newValue) -> {
checkPlatformThread();
});
c.managedProperty().addListener((observable, oldValue, newValue) -> {
checkPlatformThread();
});
c.opacityProperty().addListener((observable, oldValue, newValue) -> {
checkPlatformThread();
});
});
}
private static void watchGraph(Node node, Consumer<Node> callback) {
if (nodeinstanceof Parent p) {
for (Node c : p.getChildrenUnmodifiable()) {
watchGraph(c, callback);
}
p.getChildrenUnmodifiable().addListener((ListChangeListener<?super Node>) change -> {
for (Node c : change.getList()) {
watchGraph(c,callback);
}
});
}
callback.accept(node);
}
private static void checkPlatformThread() {
if (!Platform.isFxApplicationThread()) {
throw new IllegalStateException("Not in Fx application thread");
}
}
}
On 28/03/2025 21:06, Martin Fox wrote:
> This isn’t an area of the code that I’m familiar with. Searching for
> updateCachedBounds in the bug database shows that there’s some history
> here so maybe someone with more experience can chime in.
>
>> On Mar 28, 2025, at 11:06 AM, Christopher Schnick
>> <crschnick at xpipe.io> wrote:
>>
>> So I tried various different things to reproduce it without the
>> StackOverflow, but no success so far. But I can definitely tell you
>> from many user issue reports that this issue frequently happens.
>> Looking at the logs when this happens, there were no other exceptions
>> reported when this happens.
>>
>> It however doesn't leave the node in a bad state in most cases, in
>> production this exception usually only occurs once without the same
>> exception happening in later pulses. Having a loop of pulse
>> exceptions that happened with the JVM crash is rarer. It breaks the
>> layout however, so a restart is required.
>>
>> I would already be happy with a simple index check to not throw an
>> OOB exception in the implementation, I don't think there's any harm
>> in that. While the StackOverflow is a very made-up case, I think even
>> for that it would be good if it wouldn't throw exceptions in later
>> pulses if you're looking for a justification on why to implement an
>> index check.
>>
>> On 28/03/2025 17:26, Martin Fox wrote:
>>> I’ve been able to reproduce this inside a debugger on my Mac every
>>> eighth try or so.
>>>
>>> I’m not sure what I’m seeing is all that helpful. Your reproducing
>>> case is inducing a stack overflow exception. If the exception occurs
>>> while Parent.updateCachedBounds is executing the StackPane will be
>>> left in a bad state. This leads to the dirtyChildrenCount exceeding
>>> the number of children and then Parent.updateCachedBounds will start
>>> throwing the same AIOOBE on every layout pulse.
>>>
>>> At least in my debug runs it’s all about the timing of the stack
>>> overflow. That probably doesn’t explain why your production app is
>>> getting into the same bad state.
>>>
>>> And you’re right, this has nothing to do with the Alert. I was
>>> confused by the gap between when the exception occurs and when it’s
>>> reported.
>>>
>>> Martin
>>>
>>>> On Mar 26, 2025, at 9:20 PM, Christopher Schnick
>>>> <crschnick at xpipe.io> wrote:
>>>>
>>>> Interesting, that exception does not happen on my macOS 15.3
>>>> system. The reproducer somehow also doesn't seem to trigger the
>>>> IndexOutOfBoundsExceptions on macOS for me, only on Windows so far.
>>>> On Windows, the large alert is shown as a broken stage with no
>>>> content and controls for me, which I guess is slightly better than
>>>> an exception, but also not ideal. So it seems like the reproducer
>>>> behavior depends a lot on the specific system.
>>>>
>>>> On 26/03/2025 19:35, Martin Fox wrote:
>>>>> Christopher,
>>>>>
>>>>> Yes, there might be more than one issue here. On the Mac the call
>>>>> to Stage.showAndWait is making its way into the Mac glass code
>>>>> where an exception is being thrown leading to another call to
>>>>> Stage.showAndWait. I’ve attached the repeating block below. I
>>>>> don’t see that pattern in the Windows stack trace you provided.
>>>>>
>>>>> Martin
>>>>>
>>>>> at ParentBoundsBug.lambda$start$0(ParentBoundsBug.java:25)
>>>>> at
>>>>> java.base/java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:663)
>>>>> at
>>>>> java.base/java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:658)
>>>>> at
>>>>> javafx.graphics at 25-internal/com.sun.glass.ui.Application.reportException(Application.java:452)
>>>>> at
>>>>> javafx.graphics at 25-internal/com.sun.glass.ui.mac.MacWindow._setBounds2(Native
>>>>> Method)
>>>>> at
>>>>> javafx.graphics at 25-internal/com.sun.glass.ui.mac.MacWindow._setBounds(MacWindow.java:70)
>>>>> at
>>>>> javafx.graphics at 25-internal/com.sun.glass.ui.Window.setBounds(Window.java:589)
>>>>> at
>>>>> javafx.graphics at 25-internal/com.sun.javafx.tk.quantum.WindowStage.setBounds(WindowStage.java:304)
>>>>> at
>>>>> javafx.graphics at 25-internal/javafx.stage.Window$TKBoundsConfigurator.apply(Window.java:1566)
>>>>> at
>>>>> javafx.graphics at 25-internal/javafx.stage.Window.applyBounds(Window.java:1424)
>>>>> at
>>>>> javafx.graphics at 25-internal/javafx.stage.Window.adjustSize(Window.java:327)
>>>>> at
>>>>> javafx.graphics at 25-internal/javafx.stage.Window.sizeToScene(Window.java:284)
>>>>> at
>>>>> javafx.graphics at 25-internal/javafx.stage.Window$12.invalidated(Window.java:1215)
>>>>> at
>>>>> javafx.base at 25-internal/javafx.beans.property.BooleanPropertyBase.markInvalid(BooleanPropertyBase.java:110)
>>>>> at
>>>>> javafx.base at 25-internal/javafx.beans.property.BooleanPropertyBase.set(BooleanPropertyBase.java:145)
>>>>> at
>>>>> javafx.graphics at 25-internal/javafx.stage.Window.setShowing(Window.java:1235)
>>>>> at
>>>>> javafx.graphics at 25-internal/javafx.stage.Window.show(Window.java:1250)
>>>>> at javafx.graphics at 25-internal/javafx.stage.Stage.show(Stage.java:272)
>>>>> at
>>>>> javafx.graphics at 25-internal/javafx.stage.Stage.showAndWait(Stage.java:427)
>>>>> at
>>>>> javafx.controls at 25-internal/javafx.scene.control.HeavyweightDialog.showAndWait(HeavyweightDialog.java:162)
>>>>> at
>>>>> javafx.controls at 25-internal/javafx.scene.control.Dialog.showAndWait(Dialog.java:347)
>>>>> at ParentBoundsBug.lambda$start$0(ParentBoundsBug.java:25)
>>>>> at
>>>>> java.base/java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:663)
>>>>> at
>>>>> java.base/java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:658)
>>>>>
>>>>>> On Mar 26, 2025, at 10:49 AM, Christopher Schnick
>>>>>> <crschnick at xpipe.io> wrote:
>>>>>>
>>>>>> Hey Martin,
>>>>>>
>>>>>> thank you for looking into this. The initial StackOverflow is a
>>>>>> result of me forcing to reproduce the bounds
>>>>>> IndexOutOfBoundsException. The StackOverflow can be ignored, it
>>>>>> was merely the best method I found to transition the scene graph
>>>>>> into a state where the IndexOutOfBoundsExceptions are thrown. The
>>>>>> OOBs are not thrown in every run though, it sometimes takes a few
>>>>>> tries. In our production application, the same
>>>>>> IndexOutOfBoundsExceptions also occur randomly without a previous
>>>>>> exception. You can probably also reproduce the
>>>>>> IndexOutOfBoundsExceptions without the StackOverflow, but
>>>>>> reproducing it was very fragile, so I didn't look into it more.
>>>>>>
>>>>>> I don't think it has necessarily something to do with the alert
>>>>>> bounds as the IndexOutOfBoundsException is also thrown if you
>>>>>> don't show an alert at all. The constant
>>>>>> IndexOutOfBoundsExceptions in combination with the alert
>>>>>> showAndWait was how our application entered the original crashing
>>>>>> state. So the reproducer is more like a two-in-one.
>>>>>>
>>>>>> Best
>>>>>> Christopher Schnick
>>>>>>
>>>>>> On 26/03/2025 18:33, Martin Fox wrote:
>>>>>>> Yes, thank you Christopher for providing a reproducible test case!
>>>>>>>
>>>>>>> I was able to trigger the problem on my Mac on the first try.
>>>>>>> Since I’m using a modified version of JavaFX the system didn’t
>>>>>>> crash but instead hit a Java stack overflow error and produced a
>>>>>>> very long stack trace.
>>>>>>>
>>>>>>> At least on the Mac the problem seems to be that you’re trying
>>>>>>> to pop an Alert containing a long stack trace. While trying to
>>>>>>> adjust the Alert’s bounds JavaFX is throwing another exception
>>>>>>> but I’m not sure why. I’ll continue to look into it.
>>>>>>>
>>>>>>> Thanks again,
>>>>>>> Martin
>>>>>>>
>>>>>>>> On Mar 25, 2025, at 12:16 PM, Andy Goryachev
>>>>>>>> <andy.goryachev at oracle.com> wrote:
>>>>>>>>
>>>>>>>> Thank you, Christopher, for clarification!
>>>>>>>> Personally, I would consider this to be a problem with the
>>>>>>>> application design: the code should limit the number of alerts
>>>>>>>> shown to the user. Do you really want the user to click
>>>>>>>> through hundreds of alerts?
>>>>>>>> Nevertheless, you are right about the need for the platform to
>>>>>>>> gracefully handle the case of too many nested event loops - by
>>>>>>>> throwing an exception with a meaningful message, as Martin
>>>>>>>> proposed inhttps://github.com/openjdk/jfx/pull/1741
>>>>>>>> Cheers,
>>>>>>>> -andy
>>>>>>>>
>>>>>>>> *From:*Christopher Schnick <crschnick at xpipe.io>
>>>>>>>> *Date:*Tuesday, March 25, 2025 at 11:52
>>>>>>>> *To:*Andy Goryachev <andy.goryachev at oracle.com>
>>>>>>>> *Cc:*OpenJFX <openjfx-dev at openjdk.org>
>>>>>>>> *Subject:*Re: [External] : Re: JVM crashes on macOS when
>>>>>>>> entering too many nested event loops
>>>>>>>>
>>>>>>>> Hey Andy,
>>>>>>>>
>>>>>>>> so I think I was able to reproduce this issue for our application.
>>>>>>>>
>>>>>>>> There are two main factors how this can happen:
>>>>>>>> - We use an alert-based error reporter, meaning that we have a
>>>>>>>> default uncaught exception handler set for all threads which
>>>>>>>> will showAndWait an Alert with the exception message
>>>>>>>> - As I reported yesterday
>>>>>>>> withhttps://mail.openjdk.org/pipermail/openjfx-dev/2025-March/052963.html,
>>>>>>>> there are some rare exceptions that can occur in a normal event
>>>>>>>> loop without interference of the application, probably because
>>>>>>>> of a small bug in the bounds calculation code
>>>>>>>>
>>>>>>>> If you combine these two factors, you will end up with an
>>>>>>>> infinite loop of the showAndWait entering a nested event loop,
>>>>>>>> the event loop throwing an internal exception, and the uncaught
>>>>>>>> exception handler starting the same loop with another alert. I
>>>>>>>> don't think this is a bad implementation from our side, the
>>>>>>>> only thing that we can improve is to maybe check how deep the
>>>>>>>> uncaught exception loop is in to prevent this from occurring
>>>>>>>> indefinitely. But I would argue this can happen to any
>>>>>>>> application. Here is a sample code, based on the reproducer
>>>>>>>> from the OutOfBounds report from yesterday:
>>>>>>>>
>>>>>>>> import javafx.application.Application;
>>>>>>>> import javafx.application.Platform;
>>>>>>>> import javafx.scene.Scene;
>>>>>>>> import javafx.scene.control.Alert;
>>>>>>>> import javafx.scene.control.Button;
>>>>>>>> import javafx.scene.layout.Region;
>>>>>>>> import javafx.scene.layout.StackPane;
>>>>>>>> import javafx.scene.layout.VBox;
>>>>>>>> import javafx.stage.Stage;
>>>>>>>> import java.io.IOException;
>>>>>>>> import java.util.Arrays;
>>>>>>>> public class ParentBoundsBug extends Application {
>>>>>>>> @Override
>>>>>>>> public void start(Stage stage) throws IOException {
>>>>>>>> Thread./setDefaultUncaughtExceptionHandler/((thread,
>>>>>>>> throwable) -> {
>>>>>>>> throwable.printStackTrace();
>>>>>>>> if (Platform./isFxApplicationThread/()) {
>>>>>>>> var alert = new Alert(Alert.AlertType./ERROR/);
>>>>>>>> alert.setHeaderText(throwable.getMessage());
>>>>>>>>
>>>>>>>> alert.setContentText(Arrays./toString/(throwable.getStackTrace()));
>>>>>>>> alert.showAndWait();
>>>>>>>> } else {
>>>>>>>> // Do some other error handling for non-platform threads
>>>>>>>> // Probably just show the alert with a runLater()
>>>>>>>> // For this example, there are no exceptions
>>>>>>>> outside the platform thread
>>>>>>>> }
>>>>>>>> });
>>>>>>>> // Run delayed as Application::reportException will only be
>>>>>>>> called for exceptions
>>>>>>>> // after the application has started
>>>>>>>> Platform./runLater/(() -> {
>>>>>>>> Scene scene = new Scene(createContent(), 640, 480);
>>>>>>>> stage.setScene(scene);
>>>>>>>> stage.show();
>>>>>>>> stage.centerOnScreen();
>>>>>>>> });
>>>>>>>> }
>>>>>>>> private Region createContent() {
>>>>>>>> var b1 = new Button("Click me!");
>>>>>>>> var b2 = new Button("Click me!");
>>>>>>>> var vbox = new VBox(b1, b2);
>>>>>>>> b1.boundsInParentProperty().addListener((observable,
>>>>>>>> oldValue, newValue) -> {
>>>>>>>> vbox.setVisible(!vbox.isVisible());
>>>>>>>> });
>>>>>>>> b2.boundsInParentProperty().addListener((observable,
>>>>>>>> oldValue, newValue) -> {
>>>>>>>> vbox.setVisible(!vbox.isVisible());
>>>>>>>> });
>>>>>>>> vbox.boundsInParentProperty().addListener((observable,
>>>>>>>> oldValue, newValue) -> {
>>>>>>>> vbox.setVisible(!vbox.isVisible());
>>>>>>>> });
>>>>>>>> var stack = new StackPane(vbox, new StackPane());
>>>>>>>> stack.boundsInParentProperty().addListener((observable,
>>>>>>>> oldValue, newValue) -> {
>>>>>>>> vbox.setVisible(!vbox.isVisible());
>>>>>>>> });
>>>>>>>> return stack;
>>>>>>>> }
>>>>>>>> public static void main(String[] args) {
>>>>>>>> /launch/();
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> If the same OutOfBounds exception from the reported I linked
>>>>>>>> happens in the bounds calculation, which happens approximately
>>>>>>>> 1/5 runs for me, this application will enter new event loops
>>>>>>>> until it crashes. If the OutOfBounds doesn't trigger, it will
>>>>>>>> just throw a StackOverflow but won't continue the infinite loop
>>>>>>>> of nested event loops. So for the reproducer it is important to
>>>>>>>> try a few times until you get the described OutOfBounds.
>>>>>>>>
>>>>>>>> I attached the stacktrace of how this fails. The initial
>>>>>>>> StackOverflow causes infinitely many following exceptions in
>>>>>>>> the nested event loop.
>>>>>>>>
>>>>>>>> Best
>>>>>>>> Christopher Schnick
>>>>>>>>
>>>>>>>> On 25/03/2025 18:28, Andy Goryachev wrote:
>>>>>>>>
>>>>>>>> Dear Christopher:
>>>>>>>> Were you able to root cause why your application enters
>>>>>>>> that many nested event loops?
>>>>>>>> I believe a well-behaved application should never
>>>>>>>> experience that, unless there is some design flaw or a bug.
>>>>>>>> -andy
>>>>>>>>
>>>>>>>> *From:*Christopher Schnick<crschnick at xpipe.io>
>>>>>>>> <mailto:crschnick at xpipe.io>
>>>>>>>> *Date:*Monday, March 10, 2025 at 19:45
>>>>>>>> *To:*Andy Goryachev<andy.goryachev at oracle.com>
>>>>>>>> <mailto:andy.goryachev at oracle.com>
>>>>>>>> *Subject:*[External] : Re: JVM crashes on macOS when
>>>>>>>> entering too many nested event loops
>>>>>>>>
>>>>>>>> Our code and some libraries do enter some nested event
>>>>>>>> loops at a few places when it makes sense, but we didn't do
>>>>>>>> anything to explicitly provoke this, this occurred
>>>>>>>> naturally in our application. So it would be nice if JavaFX
>>>>>>>> could somehow guard against this, especially since crashing
>>>>>>>> the JVM is probably the worst thing that can happen.
>>>>>>>>
>>>>>>>> I looked at the documentation, but it seems like the public
>>>>>>>> API at Platform::enterNestedEventLoop does not mention this.
>>>>>>>> From my understanding, the method
>>>>>>>> Platform::canStartNestedEventLoop is potentially the right
>>>>>>>> method to indicate to the caller that the limit is close by
>>>>>>>> returning false.
>>>>>>>> And even if something like an exception is thrown when a
>>>>>>>> nested event loop is started while it is close to the
>>>>>>>> limit, that would still be much better than a direct crash.
>>>>>>>>
>>>>>>>> Best
>>>>>>>> Christopher Schnick
>>>>>>>>
>>>>>>>> On 10/03/2025 18:51, Andy Goryachev wrote:
>>>>>>>>
>>>>>>>> This looks to me like it might be hitting the (native)
>>>>>>>> thread stack size limit.
>>>>>>>> c.s.glass.ui.Application::enterNestedEventLoop() even
>>>>>>>> warns about it:
>>>>>>>> * An application may enter several nested loops
>>>>>>>> recursively. There's no
>>>>>>>> * limit of recursion other than that imposed by the
>>>>>>>> native stack size.
>>>>>>>> -andy
>>>>>>>>
>>>>>>>> *From:*openjfx-dev<openjfx-dev-retn at openjdk.org>
>>>>>>>> <mailto:openjfx-dev-retn at openjdk.org>on behalf of
>>>>>>>> Martin Fox<martinfox656 at gmail.com>
>>>>>>>> <mailto:martinfox656 at gmail.com>
>>>>>>>> *Date:*Monday, March 10, 2025 at 10:10
>>>>>>>> *To:*Christopher Schnick<crschnick at xpipe.io>
>>>>>>>> <mailto:crschnick at xpipe.io>
>>>>>>>> *Cc:*OpenJFX<openjfx-dev at openjdk.org>
>>>>>>>> <mailto:openjfx-dev at openjdk.org>
>>>>>>>> *Subject:*Re: JVM crashes on macOS when entering too
>>>>>>>> many nested event loops
>>>>>>>>
>>>>>>>> Hi Christopher,
>>>>>>>>
>>>>>>>> I was able to reproduce this crash. I wrote a small
>>>>>>>> routine that recursively calls itself in a runLater
>>>>>>>> block and then enters a nested event loop. The program
>>>>>>>> crashes when creating loop 254. I’m not sure where that
>>>>>>>> limit comes from so it’s possible that consuming some
>>>>>>>> other system resource could lower it. I couldn’t see
>>>>>>>> any good way to determine how many loops are active by
>>>>>>>> looking at the crash report since it doesn’t show the
>>>>>>>> entire call stack.
>>>>>>>> I did a quick trial on Linux and was able to create a
>>>>>>>> lot more loops (over 600) but then started seeing
>>>>>>>> erratic behavior and errors coming from the Java VM.
>>>>>>>> The behavior was variable unlike on the Mac which
>>>>>>>> always crashes when creating loop 254.
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> > On Mar 7, 2025, at 6:24 AM, Christopher
>>>>>>>> Schnick<crschnick at xpipe.io>
>>>>>>>> <mailto:crschnick at xpipe.io>wrote:
>>>>>>>> >
>>>>>>>> > Hello,
>>>>>>>> >
>>>>>>>> > I have attached a JVM fatal error log that seemingly
>>>>>>>> was caused by our JavaFX application entering too many
>>>>>>>> nested event loops, which macOS apparently doesn't like.
>>>>>>>> >
>>>>>>>> > As far as I know, there is no upper limit defined on
>>>>>>>> how often an event loop can be nested, so I think this
>>>>>>>> is a bug that can occur in rare situations.
>>>>>>>> >
>>>>>>>> > Best
>>>>>>>> > Christopher Schnick<hs_err_pid.txt>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/openjfx-dev/attachments/20250403/3e8d0785/attachment-0001.htm>
More information about the openjfx-dev
mailing list