RFR: 8325397: sun/java2d/Disposer/TestDisposerRace.java fails in linux-aarch64

Tue Sep 3 09:16:31 UTC 2024

On Thu, 15 Aug 2024 18:22:54 GMT, Viktor Klang <vklang at openjdk.org> wrote:

> My suspicion is that Condition::await() throws before having successfully reacquired the lock, and this exception is swallowed because Lock::unlock() then throws when invoke with an IllegalMonitorStateException as the current thread was not reestablished as the owner.
> 
> So this PR is intended to harden the reacquisition of await-ing methods in AQS and AQLS so that instead of throwing they spin-loop trying to reacquire the lock—at least a thread dump would show where the Thread is stuck trying to reacquire.
> 
> There's also the option of attempting to log a JFR event on the first reacquisition failure, but that might need to be done with a pre-created event as the cause of the failing acquisition may be an OOME.
> 
> I also needed to harden the TestDisposerRace itself, as it currently tries to provoke OOMEs but then proceeds to want to allocate objects used for the test itself. I'm not super-happy about the changes there, but they should be safer than before.
> 
> The first RuntimeException/Error encountered will be rethrown upon success to facilitate debugging.

@DougLea @AlanBateman I'm currently trying to provoke the test to fail with this new implementation.

@DougLea @AlanBateman This seems to fix TestDisposerRace.java

Ready for review

Closing this PR and opening a new one, to see if that helps.

Gah, so it had nothing to do with the commit message. There was, for some reason, a tab in the JBS title which was copy-pasted in as the PR title.

src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedLongSynchronizer.java line 286:

> 284:      */
> 285:     private final void reacquire(Node node, long arg) {
> 286:         for (long nanos = 1L;;) {

@AlanBateman @DougLea If we're concerned about SOE due to adding another stackframe by invoking reacquire we might consider ForceInline, but I was weary about doing so.

src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedLongSynchronizer.java line 290:

> 288:                 acquire(node, arg, false, false, false, 0L);
> 289:                 break;
> 290:             } catch (Error | RuntimeException ex) {

@DougLea @AlanBateman The first question is if this set of exception are the way to go, or we should go with Throwable

src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedLongSynchronizer.java line 294:

> 292:                 if (first) {
> 293:                     first = false;
> 294:                     // Log JFR event?

@DougLea @AlanBateman The second question is whether we should declare a JFR event for this use-case or not.

src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedLongSynchronizer.java line 312:

> 310:             }
> 311:         }
> 312:     }

@DougLea @AlanBateman So I've added such that the first exception/error is rethrown after subsequent successful reacquisition. See above.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2292008244
PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2293131396
PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2296564889
PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2325963172
PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2326008569
PR Review Comment: https://git.openjdk.org/jdk/pull/20602#discussion_r1719777654
PR Review Comment: https://git.openjdk.org/jdk/pull/20602#discussion_r1718867228
PR Review Comment: https://git.openjdk.org/jdk/pull/20602#discussion_r1718867694
PR Review Comment: https://git.openjdk.org/jdk/pull/20602#discussion_r1731486424