RFR: 8325397: sun/java2d/Disposer/TestDisposerRace.java fails in linux-aarch64
Viktor Klang
vklang at openjdk.org
Tue Sep 3 09:16:31 UTC 2024
On Thu, 15 Aug 2024 18:22:54 GMT, Viktor Klang <vklang at openjdk.org> wrote:
> My suspicion is that Condition::await() throws before having successfully reacquired the lock, and this exception is swallowed because Lock::unlock() then throws when invoke with an IllegalMonitorStateException as the current thread was not reestablished as the owner.
>
> So this PR is intended to harden the reacquisition of await-ing methods in AQS and AQLS so that instead of throwing they spin-loop trying to reacquire the lock—at least a thread dump would show where the Thread is stuck trying to reacquire.
>
> There's also the option of attempting to log a JFR event on the first reacquisition failure, but that might need to be done with a pre-created event as the cause of the failing acquisition may be an OOME.
>
> I also needed to harden the TestDisposerRace itself, as it currently tries to provoke OOMEs but then proceeds to want to allocate objects used for the test itself. I'm not super-happy about the changes there, but they should be safer than before.
>
> The first RuntimeException/Error encountered will be rethrown upon success to facilitate debugging.
@DougLea @AlanBateman I'm currently trying to provoke the test to fail with this new implementation.
@DougLea @AlanBateman This seems to fix TestDisposerRace.java
Ready for review
Closing this PR and opening a new one, to see if that helps.
Gah, so it had nothing to do with the commit message. There was, for some reason, a tab in the JBS title which was copy-pasted in as the PR title.
src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedLongSynchronizer.java line 286:
> 284: */
> 285: private final void reacquire(Node node, long arg) {
> 286: for (long nanos = 1L;;) {
@AlanBateman @DougLea If we're concerned about SOE due to adding another stackframe by invoking reacquire we might consider ForceInline, but I was weary about doing so.
src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedLongSynchronizer.java line 290:
> 288: acquire(node, arg, false, false, false, 0L);
> 289: break;
> 290: } catch (Error | RuntimeException ex) {
@DougLea @AlanBateman The first question is if this set of exception are the way to go, or we should go with Throwable
src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedLongSynchronizer.java line 294:
> 292: if (first) {
> 293: first = false;
> 294: // Log JFR event?
@DougLea @AlanBateman The second question is whether we should declare a JFR event for this use-case or not.
src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedLongSynchronizer.java line 312:
> 310: }
> 311: }
> 312: }
@DougLea @AlanBateman So I've added such that the first exception/error is rethrown after subsequent successful reacquisition. See above.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2292008244
PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2293131396
PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2296564889
PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2325963172
PR Comment: https://git.openjdk.org/jdk/pull/20602#issuecomment-2326008569
PR Review Comment: https://git.openjdk.org/jdk/pull/20602#discussion_r1719777654
PR Review Comment: https://git.openjdk.org/jdk/pull/20602#discussion_r1718867228
PR Review Comment: https://git.openjdk.org/jdk/pull/20602#discussion_r1718867694
PR Review Comment: https://git.openjdk.org/jdk/pull/20602#discussion_r1731486424
More information about the core-libs-dev
mailing list