RFR 8230594: Allow direct handshakes without VMThread intervention

Tue Jan 14 19:17:43 UTC 2020

Hi David,

On 1/14/20 4:13 AM, David Holmes wrote:
> Hi Patricio,
>
> Thanks for tackling this. I've taken an initial look through trying to 
> understand the general approach, but I don't claim to understand all 
> the subtle details, so there are some queries below to help with my 
> overall understanding.
>
> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>> Hi all,
>>
>> The following patch adds the ability to execute direct handshakes 
>> between JavaThreads without the VMThread intervention, and enables 
>> this functionality for biased locking revocations.
>> The current handshake mechanism that uses the VMThread, either to 
>> handshake one JavaThread or all of them, is still the default unless 
>> you specify otherwise when calling Handshake::execute().
>
> Can I suggest that rather than using an overloaded:
>
>  static bool execute(HandshakeClosure* hs_cl, JavaThread* target, bool 
> is_direct_handshake = false);
>
> you instead add an execute_direct method for clarity?
Ok, added execute_direct() and removed overloading.

>> In order to avoid adding additional overhead to this path that uses 
>> the VMThread (especially the one that handshakes all JavaThreads) I 
>> added a new HandshakeOperation pointer in the HandshakeState class, 
>> _operation_direct, to be used for the direct handshake cases only and 
>> whose access is serialized between JavaThreads by using a semaphore. 
>> Thus, one direct handshake will be allowed at any given time, and 
>> upon completion the semaphore will be signaled to allow the next 
>> handshaker if any to proceed. In this way the old _operation can 
>> still be used only by the VMThread without the need for 
>> synchronization to access it. The handshakee will now check if any of 
>> _operation or _operation_direct is set when checking for a pending 
>> handshake and will try to execute both in 
>> HandshakeState::process_self_inner(). The execution of the 
>> handshake’s ThreadClosure, either direct handshake or not, is still 
>> protected by a semaphore, which I renamed to _processing_sem.
>> I converted the semaphore _done in HandshakeOperation to be just an 
>> atomic counter because of bug 
>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which I 
>> actually hit once!). Since the semaphore could not be static anymore 
>> due to possibly having more than one HandshakeOperation at a time, 
>> the handshakee could try to access the nwaiters field of an already 
>> destroyed semaphore when signaling it. In any case nobody was waiting 
>> on that semaphore (we were not using kernel functionality), so just 
>> using an atomic counter seems more appropriate.
>> In order to avoid issues due to disarming a JavaThread that should 
>> still be armed for a handshake or safepoint, each JavaThread will now 
>> always disarm its own polling page.
>> I also added a new test, HandshakeDirectTest.java, which tries to 
>> stress the use of direct handshakes with revocations.
>> In terms of performance, I measured no difference in the execution 
>> time of one individual handshake. The difference can be seen when 
>> several handshakes at a time are executed as expected. So for example 
>> on Linux running on an Intel Xeon 8167M cpu, test 
>> HandshakeDirectTest.java (which executes 50000 handshakes between 32 
>> threads) executes in around 340ms using direct handshakes and in 
>> around 5.6 seconds without it. For a modified version of that test 
>> that only executes 128 handshakes between the 32 threads and avoids 
>> any suspend-resume, the test takes around 12ms with direct handshakes 
>> and 19ms without it.
>> Tested with mach5, tiers1-6.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>
> src/hotspot/share/runtime/handshake.hpp
>
>  36 // A handshake closure is a callback that is executed for each 
> JavaThread
>   37 // while that thread is in a safepoint safe state. The callback 
> is executed
>   38 // either by the thread itself or by the VM thread while keeping 
> the thread
>   39 // in a blocked state. A handshake can be performed with a single
>   40 // JavaThread as well.
>
> Does this comment block need updating for the direct case?
>
>  61 // the operation is only done by either VM thread on behalf of the 
> JavaThread
>  62 // or the JavaThread itself.
>
> Again does this need an update?
Yes, I forgot to update those comments in handshake.hpp. Fixed.

> src/hotspot/share/runtime/handshake.hpp
>
>  41 class HandshakeOperation: public StackObj {
>
> Would it be clearer to have HandShakeOperation subclassed by 
> DirectHandShakeOperation, rather than it being a property of the 
> current op instance?
But I would still need to access the is_direct field for non-direct 
handshakes in HandshakeState::try_process() anyways.

>  41 class HandshakeOperation: public StackObj {
>  42   HandshakeClosure* _handshake_cl;
>  43   int64_t _pending_threads;
>  ...
>  53   void add_target_count(int count) { 
> Atomic::add(&_pending_threads, count); }
>
> Can you clarify the lifecycle of this StackObj please. You obviously 
> expose it to other threads - hence the atomic update - but that means 
> it needs to be guaranteed to be live when that update occurs.
The HandshakeOperation object is created in Handshake::execute() (or 
Handshake::execute_direct() now) by the Handshaker. The only other 
threads that could access this object are the VMThread(for non-direct 
case only) and the Handshakee(in both cases). Seeing that it's safe for 
the VMThread to decrement the counter is straightforward since the 
Handshaker will be waiting on the VMOperationRequest_lock waiting for 
the operation to finish (this is just as always has been).
As for the Handshakee, we can see that the HandshakeOperation object 
will be alive as long as the VMThread(for the non-direct case) or 
Handshaker(for the direct case) keep seeing that the operation is not 
done by calling HandshakeOperation::is_completed(). Once that returns 
true, for the non-direct case the VMThread will finish the operation and 
wake up the Handshaker who will return from Handshake::execute() and the 
object will be destroyed, and for the direct case the Handshaker will 
just return from execute_direct() and the object will be destroyed. 
Since is_completed() will only return true when _pending_threads reaches 
zero, that means the decrement operation had to be safe. Just as an 
observation, for the HandshakeAllThreads operation, _pending_threads 
could go to zero and even negative before executing add_target_count(), 
but that's okay because the VMThread doesn't call is_completed() before 
that and the load of _pending_threads cannot float above 
add_target_count() since the atomic add operation provides a memory fence.
And finally the Handshakee cannot try to execute an operation that has 
already being processed by the VMThread/Handshaker, because they clear 
it first and then signal the _processing_sem semaphore in 
HandshakeState::try_process(). That worked the same before this change.

> -- 
>
>  349     HandshakeOperation* op = Atomic::load_acquire(&_operation);
>  350     if (op == NULL) {
>  351       op = Atomic::load_acquire(&_operation_direct);
>
> This gives preference to the non-direct op. Is it possible for the 
> direct-op to be indefinitely delayed if there is a series of 
> non-direct ops?
Yes, it's possible, although I think that's an unlikely scenario. That 
would mean the Handshakee would be stuck in that while loop with the 
VMThread constantly setting new operations in _operation. But I can 
modify the loop and make it try to execute both before going into the 
next interation, something like (more on load_acquire next):

     if (has_operation()) {
       HandleMark hm(_thread);
       CautiouslyPreserveExceptionMark pem(_thread);
       HandshakeOperation * op = _operation;
       if (op != NULL) {
         // Disarm before execute the operation
         clear_handshake(false);
         op->do_handshake(_thread);
       }
       op = _operation_direct;
       if (op != NULL) {
         // Disarm before execute the operation
         clear_handshake(true);
         op->do_handshake(_thread);
       }
     }

what do you think?

> Also I don't see the release_stores that these load_acquire would pair 
> with. I'm assuming it should be here:
>
>  319     _operation = op;
>  323     _operation_direct = op;
>
> But I would also have expected all necessary memory synchronization to 
> already be present via the semaphore operations and/or the rest of the 
> handshake mechanism. ??
Yes, the release is in SafepointMechanism::arm_local_poll_release() 
after setting those fields. But it's true that those load_acquire are 
not needed since the semaphore already has acquire semantics. I actually 
wanted to do a normal load but ended up just copying how we were loading 
_operation. I''ll change them and retest.

> 396 if ((!is_direct && _operation != NULL) || (is_direct && 
> _operation_direct != NULL)){
> 406   if ((!is_direct && _operation == NULL) || (is_direct && 
> _operation_direct == NULL)){
>
> Can this not be abstracted back into a "has_operation" method that 
> takes a "direct" parameter? e.g
>
>   bool has_operation() const { return _operation != NULL || 
> _operation_direct != NULL; }
> + bool has_specific_operation(bool direct) {
> +   return direct ? _operation_direct != NULL : _operation != NULL;
> + }
Done.

> ---
>
> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>
> Please add:
>
> import java.util.concurrent.Semaphore;
>
> so you can just refer to Sempahore.
Done.

> Style nit: _working_threads
>
> Java style is to not use leading underscore and to use camelCase for 
> variables ie workingThreads.
Right, I changed them to camelCase.

> 44 static Thread _suspendresume_thread = new Thread();
>
> The above is dead code.
Removed.

> 53 if (_is_biased[me] == false) {
>
> Style nit: use "if (!_is_biased[me]) {"
Changed.

> 80             } catch(InterruptedException ie) {
>   81             }
>
> I suggest inserting "throw new Error("Unexpected interrupt");" for 
> good measure.
>
>  111                     _working_threads[i].suspend();
Done.

> Thread suspension has been deprecated-for-removal as of JDK 14 so 
> could be gone in JDK 15 (but more likely 16). If the suspend/resume is 
> important for this test then you will need to switch to using JVM TI 
> suspend/resume; or else perhaps introduce a WhiteBox method to do 
> whatever you need in the VM.
It's not really needed for this test. I just wanted to mixed 
suspend-resume with handshakes because I know there could be some buggy 
interactions between them. But since it has been deprecated then maybe 
there is no point in stressing handshakes with them and I can remove 
that part.

>  123 // Wait until the desired number of direct handshakes is reached
>  124         while (_handshake_count.get() < DIRECT_HANDSHAKES_MARK) {
>  125             Thread.sleep(10); // sleep for 10ms
>  126         }
>
> You could just do a join() on one of the worker threads.
Right, didn't thought about that. Changed.

Thanks for looking at this David! I'll retest with the above changes in 
the meantime.

Patricio
> Thanks,
> David
> -----
>
>> Thanks!
>> Patricio
>>