RFR: 8340405: JavaFX shutdown hook can hang preventing app from exiting

Ambarish Rapte arapte at openjdk.org
Fri Sep 20 06:12:42 UTC 2024


On Thu, 19 Sep 2024 23:17:31 GMT, Andy Goryachev <angorya at openjdk.org> wrote:

>> This PR fixes a hang on exit which can happen if QuantumToolkit.dispose hangs when called from the QuantumToolkit shutdown hook. A shutdown hook should never run indefinitely, so the fix is to call dispose from a background thread and wait for up to 5 seconds for it to finish normally, and then throw an Error so that the application will be able to exit anyway.
>> 
>> The QuantumToolkit registers a "Glass/Prism Shutdown Hook" that calls the QuantumToolkit.dispose method to gracefully shutdown the toolkit and renderer. If dispose hangs or deadlocks, this will prevent the application from exiting, requiring the process to be killed. This has been observed on macOS, due to [JDK-8238505](https://bugs.openjdk.org/browse/JDK-8238505), which is is an intermittent deadlock that occurs during shutdown. We ought to eventually fix that intermittent deadlock, but even in the presence of a deadlock or hang, we don't want the shutdown hook to hang.
>> 
>> This can happen at any time, but is especially a problem during an automated test run, which will then hang until the Jenkins job is either killed or its global timer (currently set to 12 hours) kills the job. Even in that case, processes can be left lying around which can affect subsequent runs.
>> 
>> I discovered this while testing [JDK-8328629](https://bugs.openjdk.org/browse/JDK-8328629), which adds a default Unit 5 timeout, and without fixing the deadlock, the timeout will be unable to abort the hanging test.
>> 
>> ### Testing instructions
>> 
>> I tested this in connection with PR #1575 / [JDK-8328629](https://bugs.openjdk.org/browse/JDK-8328629) using my [test-timeout-and-shutdown-hang](https://github.com/kevinrushforth/jfx/tree/test-timeout-and-shutdown-hang) branch. That branch has the fixes from both this PR and PR #1575 as well as tests that will hang indefinitely without the fixes and fail due to the expected timeout with the fixes.
>
> modules/javafx.graphics/src/main/java/com/sun/javafx/tk/quantum/QuantumToolkit.java line 276:
> 
>> 274:                 try {
>> 275:                     if (!disposeLatch.await(5, TimeUnit.SECONDS)) {
>> 276:                         throw new InternalError("dispose timed out");
> 
> could this happen on a real system?  if so, should the message be more clear and user-friendly?
> 
> also, if it can be encountered in the field, it might be better to declare a constant and add a javadoc to it explaining what it is.
> 
> what do you think?

I think javadoc and a property might be an overkill at this moment. The root problem is the deadlock and is very intermittent, and it hinders with the jenkins jobs. The change seems good means to avoid such situations where process keep running unchecked for hours. And if in case the frequency of issue increases or we could find a definite way of reproducing the issue, then we should try to fix the deadlock.
Approving, I shall re-approve if updated.

-------------

PR Review Comment: https://git.openjdk.org/jfx/pull/1574#discussion_r1768038274


More information about the openjfx-dev mailing list