RFR 8230594: Allow direct handshakes without VMThread intervention

Mon Jan 13 16:25:32 UTC 2020

Hi all,

The following patch adds the ability to execute direct handshakes 
between JavaThreads without the VMThread intervention, and enables this 
functionality for biased locking revocations.
The current handshake mechanism that uses the VMThread, either to 
handshake one JavaThread or all of them, is still the default unless you 
specify otherwise when calling Handshake::execute(). In order to avoid 
adding additional overhead to this path that uses the VMThread 
(especially the one that handshakes all JavaThreads) I added a new 
HandshakeOperation pointer in the HandshakeState class, 
_operation_direct, to be used for the direct handshake cases only and 
whose access is serialized between JavaThreads by using a semaphore. 
Thus, one direct handshake will be allowed at any given time, and upon 
completion the semaphore will be signaled to allow the next handshaker 
if any to proceed. In this way the old _operation can still be used only 
by the VMThread without the need for synchronization to access it. The 
handshakee will now check if any of _operation or _operation_direct is 
set when checking for a pending handshake and will try to execute both 
in HandshakeState::process_self_inner(). The execution of the 
handshake’s ThreadClosure, either direct handshake or not, is still 
protected by a semaphore, which I renamed to _processing_sem.
I converted the semaphore _done in HandshakeOperation to be just an 
atomic counter because of bug 
https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which I actually 
hit once!). Since the semaphore could not be static anymore due to 
possibly having more than one HandshakeOperation at a time, the 
handshakee could try to access the nwaiters field of an already 
destroyed semaphore when signaling it. In any case nobody was waiting on 
that semaphore (we were not using kernel functionality), so just using 
an atomic counter seems more appropriate.
In order to avoid issues due to disarming a JavaThread that should still 
be armed for a handshake or safepoint, each JavaThread will now always 
disarm its own polling page.
I also added a new test, HandshakeDirectTest.java, which tries to stress 
the use of direct handshakes with revocations.
In terms of performance, I measured no difference in the execution time 
of one individual handshake. The difference can be seen when several 
handshakes at a time are executed as expected. So for example on Linux 
running on an Intel Xeon 8167M cpu, test HandshakeDirectTest.java (which 
executes 50000 handshakes between 32 threads) executes in around 340ms 
using direct handshakes and in around 5.6 seconds without it. For a 
modified version of that test that only executes 128 handshakes between 
the 32 threads and avoids any suspend-resume, the test takes around 12ms 
with direct handshakes and 19ms without it.
Tested with mach5, tiers1-6.

Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
Webrev: http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/

Thanks!
Patricio