RFR: JDK-8265298: Hard VM crash when deadlock between "access" and higher ranked lock is detected

Patricio Chilano Mateo pchilanomate at openjdk.java.net
Thu Apr 15 19:02:37 UTC 2021


On Thu, 15 Apr 2021 16:56:31 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> I stumbled upon this when doing some Shenandoah work. The development code tried to lock the `leaf` lock, while already holding the `access` lock. Normally it would have been detected by VM, but instead, we tried to recursively acquire `tty_lock` for `Thread::print_owned_locks`. But that `tty` lock is still ranked higher than `access`, so deadlock detection triggers over and over again until we run out of stack and crash hard. `tty` and `access` are ranked this way because of [JDK-8214315](https://bugs.openjdk.java.net/browse/JDK-8214315). Read the rest in the bug.  
> 
> I believe the way out is to only enter `Thread::print_owned_locks` when we know deadlock detection code would not run in circles. New test for `access` and `leaf` shows the original failure. Another new test checks `tty` and `special` to verify that the check should be `> tty`, not `>= tty`.
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug tier1
>  - [x] New regression tests (fail without the patch, pass wit it)

Hi Aleksey,

Fix looks good to me. We could even avoid that print altogether since we already have the offending locks in the assert message, and the hs_err file will show the owners of all locks anyways. But I guess it doesn't hurt to try and print that when we know is safe.
Only comments on the added test cases.

Thanks,
Patricio

test/hotspot/gtest/runtime/test_mutex_rank.cpp line 107:

> 105: }
> 106: 
> 107: TEST_VM_ASSERT_MSG(MutexRank, mutex_wait_access_leaf,

s/mutex_wait/mutex_lock

test/hotspot/gtest/runtime/test_mutex_rank.cpp line 122:

> 120: }
> 121: 
> 122: TEST_VM_ASSERT_MSG(MutexRank, mutex_wait_tty_special,

s/mutex_wait/mutex_lock

test/hotspot/gtest/runtime/test_mutex_rank.cpp line 210:

> 208:   monitor_rank_access->lock_without_safepoint_check();
> 209:   monitor_rank_leaf->lock_without_safepoint_check();
> 210:   monitor_rank_leaf->wait_without_safepoint_check(1);

If you want to exercise the wait() error case you should lock the leaf lock first and then the access one. Then you will get the assert at wait_without_safepoint_check(1). You will have to fix the error message that we check for above to be the one used in the wait case.

test/hotspot/gtest/runtime/test_mutex_rank.cpp line 226:

> 224:   monitor_rank_tty->lock_without_safepoint_check();
> 225:   monitor_rank_special->lock_without_safepoint_check();
> 226:   monitor_rank_special->wait_without_safepoint_check(1);

Same as above, should lock special first and then tty one.

-------------

Marked as reviewed by pchilanomate (Committer).

PR: https://git.openjdk.java.net/jdk/pull/3524


More information about the hotspot-runtime-dev mailing list