RFR: 8309637: runtime/handshake/HandshakeTimeoutTest.java fails with "has not cleared handshake op" and SIGILL [v2]
David Holmes
dholmes at openjdk.org
Fri Jul 7 02:26:56 UTC 2023
On Thu, 6 Jul 2023 16:06:15 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:
>> Please review the following fix. The test is checking the correct behavior of flag HandshakeTimeout by spawning a child VM and verifying that it crashes due to a timeout during a handshake operation. The test sometimes times out though because the child VM deadlocks during error reporting. The issue is that the JavaThread doing the error reporting deadlocks trying to acquire a lock it already owns, and the Watcher thread never kicks in to shutdown the VM because it hasn't been created yet and will never be: the "main" JavaThread is blocked in Threads::create_vm(), somewhere before creating the Watcher thread, waiting to acquire a lock that the error reporting thread owns. There are more details about the specifics resources involved in the deadlocks in the bug comments.
>>
>> I moved the start of the Watcher thread further up during the initialization steps so that it is there even before creating the VMThread. Ideally it should be one of the first things we create in Threads::create_vm() but unfortunately there are dependencies. I found the earliest we can create it is at the same place the AsyncLogWriter thread is created since the barrier set needs to have been created already. I didn't move the creation there but just after that call to init_globals(). To keep the current behavior where enrolled tasks are only processed after the VM has been fully created I added a first loop in WatcherThread::run() where we only check for error reporting hangs. Most of the other changes in the patch are factoring out code.
>>
>> The test timeout can be reproduced by adding a delay in ThreadCritical() (one of the locks involved in the deadlock). I verified the fix by running the test with that extra delay and verified it doesn't time out anymore. I also run tiers1-3 in mach5. I'll run the upper tiers too.
>>
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
>
> simplify code
Looks good. I'd missed the need to adjust `sleep` too but this version still seems simpler - thanks for making the changes.
-------------
Marked as reviewed by dholmes (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/14777#pullrequestreview-1517658903
More information about the hotspot-runtime-dev
mailing list