RFR: 8309637: runtime/handshake/HandshakeTimeoutTest.java fails with "has not cleared handshake op" and SIGILL [v2]
Patricio Chilano Mateo
pchilanomate at openjdk.org
Thu Jul 6 16:06:15 UTC 2023
> Please review the following fix. The test is checking the correct behavior of flag HandshakeTimeout by spawning a child VM and verifying that it crashes due to a timeout during a handshake operation. The test sometimes times out though because the child VM deadlocks during error reporting. The issue is that the JavaThread doing the error reporting deadlocks trying to acquire a lock it already owns, and the Watcher thread never kicks in to shutdown the VM because it hasn't been created yet and will never be: the "main" JavaThread is blocked in Threads::create_vm(), somewhere before creating the Watcher thread, waiting to acquire a lock that the error reporting thread owns. There are more details about the specifics resources involved in the deadlocks in the bug comments.
>
> I moved the start of the Watcher thread further up during the initialization steps so that it is there even before creating the VMThread. Ideally it should be one of the first things we create in Threads::create_vm() but unfortunately there are dependencies. I found the earliest we can create it is at the same place the AsyncLogWriter thread is created since the barrier set needs to have been created already. I didn't move the creation there but just after that call to init_globals(). To keep the current behavior where enrolled tasks are only processed after the VM has been fully created I added a first loop in WatcherThread::run() where we only check for error reporting hangs. Most of the other changes in the patch are factoring out code.
>
> The test timeout can be reproduced by adding a delay in ThreadCritical() (one of the locks involved in the deadlock). I verified the fix by running the test with that extra delay and verified it doesn't time out anymore. I also run tiers1-3 in mach5. I'll run the upper tiers too.
>
> Thanks,
> Patricio
Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
simplify code
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/14777/files
- new: https://git.openjdk.org/jdk/pull/14777/files/8cb63637..8848ed99
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=14777&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=14777&range=00-01
Stats: 108 lines in 2 files changed: 46 ins; 58 del; 4 mod
Patch: https://git.openjdk.org/jdk/pull/14777.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/14777/head:pull/14777
PR: https://git.openjdk.org/jdk/pull/14777
More information about the hotspot-runtime-dev
mailing list