RFR 8230594: Allow direct handshakes without VMThread intervention

Fri Jan 17 23:06:25 UTC 2020

Hi Dan,

On 1/16/20 7:23 PM, Daniel D. Daugherty wrote:
>> Here is v3:
>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/ 
>
> src/hotspot/share/runtime/handshake.hpp
>     L34: class HandshakeOperation;
>         nit: please move before L33 for alpha sort order.
Done.

> L38: // either by the thread itself or by the VM thread while keeping 
> the thread
>         nit: s/VM thread/VMThread/
>
>         Not your typo, but for consistency with your comment addition.
>         (Also, I prefer "VMThread" to "VM thread".)
Done.

> L41: // thread itself or, depending on whether the operation is a 
> direct handshake
>         Perhaps:
>             s/thread/target thread/
>         or:
>             s/thread/target JavaThread/
>
>         since only JavaThreads can be the target of a handshake.
Done.

> L42: // or not, by the JavaThread that requested the handshake or the 
> VMThread
>         Can handshakes be requested by non-JavaThreads?
I wrote the code with the assumption that direct handshakes will be 
between JavaThreads only, since at least for now this is going to be 
used only for biased locking revocations. Changing it to allow 
non-JavaThreads to request direct handshakes should be straightforward 
though. We just wouldn't need to call SafepointMechanism::should_block() 
to check for pending handshakes and we wouldn't need to wait with 
safepoint check in set_operation(). I wouldn't be able to test it though 
since we don't have those kind of direct handshakes yet. Maybe we can 
change it later when we have a real case?

> Perhaps this for the comment block:
>
>     L36: // A handshake closure is a callback that is executed for 
> each JavaThread
>     L37: // while that thread is in a safepoint safe state. The 
> callback is executed
>     L38: // either by the target JavaThread itself or by the VMThread 
> while keeping
>     L39: // the target thread in a blocked state. A handshake can be 
> performed with a
>     L40: // single JavaThread as well. In that case, the callback is 
> executed either
>     L41: // by the target JavaThread itself or, depending on whether 
> the operation is
>     L42: // a direct handshake or not, by the thread that requested 
> the handshake or
>     L43: // the VMThread respectively.
Done. If you are okay with the previous point I changed the last 
"thread" to "JavaThread".

> L65: // the JavaThread or the JavaThread itself.
>         nit: s/or the/or by the target/
Done.

> src/hotspot/share/runtime/handshake.cpp
>     L174:     _op->add_target_count(number_of_threads_issued - 1);
>         Perhaps add a comment above the line:
>           // _op was created with a count == 1 so don't double count:
Done.

> L285:   JavaThread *self = (JavaThread*)Thread::current();
>     L325: 
> _handshake_turn_sem.wait_with_safepoint_check((JavaThread*)Thread::current());
>         If the caller is supposed to be a JavaThread, then use
>         JavaThread::current() instead of casting.
Done.

> L286:   HandshakeOperation op(thread_cl, true);
>         Please prefix literal booleans with /*is_direct*/ so we know what
>         the parameter is.
Done.

> L341:   assert(Thread::current() == _thread, "should call from thread");
>         This should probably be JavaThread::current().
I read the discussion you had with David and I think you both agree on 
keeping Thread::current() so I didn't change it.

> L356:         // Disarm before execute the operation
>     L362:         // Disarm before execute the operation
>         nit: s/execute/executing/
Done.

> L357:         clear_handshake(false);
>     L363:         clear_handshake(true);
>         Please prefix literal booleans with /*is_direct*/ so we know what
>         the parameter is.
Done.

> L430:   if ( (is_direct && op != _operation_direct)) {
>     L431:     _processing_sem.signal();
>         Why is the signal() call needed here (and not in the other
>         'return false' cases above)?
Because we only successfully decremented the semaphore if 
claim_handshake() returns true. In all the previous "return false" cases 
we either did not try to decrement it or we failed to do so.

> src/hotspot/share/runtime/biasedLocking.cpp
>     L671:   assert(Thread::current() == biased_locker ...
>         Can you add:
>
>           Thread* cur = Thread::current();
>
>         above this line? Then switch all the uses of "Thread::current()"
>         in this function to "cur".
Done.

> src/hotspot/share/runtime/mutexLocker.cpp
>         L191 void assert_locked_or_safepoint_or_handshake(const Mutex* 
> lock, const JavaThread* thread) {
>     old L192:   if (Thread::current()->is_VM_thread() && 
> thread->is_vmthread_processing_handshake()) return;
>     new L192:   if (Thread::current() == 
> thread->get_active_handshaker()) return;
>         I had to really think about logical equivalence here.
>         I think I've convinced myself that this is okay...
Without this patch, only the VMThread could be processing a handshake 
closure for a given JavaThread. But now it can also be executed by a 
JavaThread. Hence I changed the code to check whether the current thread 
is the one executing the handshake closure on behalf of the JavaThread 
passed as a parameter to assert_locked_or_safepoint_or_handshake().

> src/hotspot/share/runtime/safepoint.cpp
>     No comments (see next file).
>
> src/hotspot/share/runtime/safepointMechanism.inline.hpp
>     The reason for the deletion of
>     SafepointMechanism::disarm_if_needed() is not obvious.
Before, only the VMThread would arm a JavaThread, either for a safepoint 
or a handshake. The disarm could be made by either of them. The disarm 
by the VMThread was always safe, since it would do it after the 
operation was completed. For the JavaThread not quite. To avoid 
disarming a still pending safepoint/handshake the JavaThread immediately 
rechecks if there is a pending operation after doing the disarm, and in 
case there is, it arms itself again (that code is in 
SafepointMechanism::block_if_requested_slow()).
If we now allow JavaThreads doing direct handshakes or the VMThread to 
disarm the polling page of a given JavaThread then we would need to 
protect access to the polling page with some synchronization primitive 
to avoid disarming a current pending operation (VMThread disarming when 
there is still a pending direct handshake, or JavaThread disarming when 
there is still a pending safepoint/handshake). To avoid adding 
additional overhead in accessing the polling page we can make each 
JavaThread disarm itself.

> src/hotspot/share/runtime/thread.hpp
>     No comments (nice encapsulation cleanup).
>
> src/hotspot/share/runtime/thread.cpp
>     No comments (nice encapsulation cleanup).
>
> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>     L107:         workingThreads[0].join();
>         By only join()ing with the 0th worker thread, you won't
>         detect if any of the worker threads is stuck due to a bug.
Right, I changed it and now I do join() with all the workers.

Here is v4 addressing also Coleen's comments. Hope I didn't miss anything:

Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v04/webrev/
Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v04/inc/webrev/

Thanks!

Patricio
> Dan
>
>
> On 1/15/20 10:56 PM, Patricio Chilano wrote:
>> Hi David,
>>
>> On 1/15/20 8:29 PM, David Holmes wrote:
>>> Hi Patricio,
>>>
>>> On 16/01/2020 6:06 am, Patricio Chilano wrote:
>>>> Hi David,
>>>>
>>>> Here is v2:
>>>>
>>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/webrev/
>>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/inc/webrev/
>>>
>>> src/hotspot/share/runtime/handshake.cpp
>>>
>>> why is it that execute_direct doesn't check for 
>>> uses_thread_local_poll() ?
>> Sorry, I thought for a moment that thread-local handshakes for arm32 
>> was already implemented and all platforms where using it. I added 
>> that check back, and by testing that path I had to also add back the 
>> VMThread in the assert in walk_stack_and_revoke().
>>
>>> src/hotspot/share/runtime/handshake.hpp
>>>
>>> typo: wether -> whether
>> Fixed.
>>
>> Here is v3:
>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/
>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/inc/webrev/
>>
>> Thanks!
>>
>> Patricio
>>> Otherwise updates look fine.
>>>
>>> Thanks,
>>> David
>>>
>>>> Passed one round of t1-6, and running more rounds now.
>>>>
>>>> Thanks,
>>>> Patricio
>>>>
>>>> On 1/14/20 7:57 PM, David Holmes wrote:
>>>>> Hi Patricio,
>>>>>
>>>>> On 15/01/2020 5:17 am, Patricio Chilano wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 1/14/20 4:13 AM, David Holmes wrote:
>>>>>>> Hi Patricio,
>>>>>>>
>>>>>>> Thanks for tackling this. I've taken an initial look through 
>>>>>>> trying to understand the general approach, but I don't claim to 
>>>>>>> understand all the subtle details, so there are some queries 
>>>>>>> below to help with my overall understanding.
>>>>>>>
>>>>>>> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> The following patch adds the ability to execute direct 
>>>>>>>> handshakes between JavaThreads without the VMThread 
>>>>>>>> intervention, and enables this functionality for biased locking 
>>>>>>>> revocations.
>>>>>>>> The current handshake mechanism that uses the VMThread, either 
>>>>>>>> to handshake one JavaThread or all of them, is still the 
>>>>>>>> default unless you specify otherwise when calling 
>>>>>>>> Handshake::execute().
>>>>>>>
>>>>>>> Can I suggest that rather than using an overloaded:
>>>>>>>
>>>>>>>  static bool execute(HandshakeClosure* hs_cl, JavaThread* 
>>>>>>> target, bool is_direct_handshake = false);
>>>>>>>
>>>>>>> you instead add an execute_direct method for clarity?
>>>>>> Ok, added execute_direct() and removed overloading.
>>>>>>
>>>>>>>> In order to avoid adding additional overhead to this path that 
>>>>>>>> uses the VMThread (especially the one that handshakes all 
>>>>>>>> JavaThreads) I added a new HandshakeOperation pointer in the 
>>>>>>>> HandshakeState class, _operation_direct, to be used for the 
>>>>>>>> direct handshake cases only and whose access is serialized 
>>>>>>>> between JavaThreads by using a semaphore. Thus, one direct 
>>>>>>>> handshake will be allowed at any given time, and upon 
>>>>>>>> completion the semaphore will be signaled to allow the next 
>>>>>>>> handshaker if any to proceed. In this way the old _operation 
>>>>>>>> can still be used only by the VMThread without the need for 
>>>>>>>> synchronization to access it. The handshakee will now check if 
>>>>>>>> any of _operation or _operation_direct is set when checking for 
>>>>>>>> a pending handshake and will try to execute both in 
>>>>>>>> HandshakeState::process_self_inner(). The execution of the 
>>>>>>>> handshake’s ThreadClosure, either direct handshake or not, is 
>>>>>>>> still protected by a semaphore, which I renamed to 
>>>>>>>> _processing_sem.
>>>>>>>> I converted the semaphore _done in HandshakeOperation to be 
>>>>>>>> just an atomic counter because of bug 
>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which I 
>>>>>>>> actually hit once!). Since the semaphore could not be static 
>>>>>>>> anymore due to possibly having more than one HandshakeOperation 
>>>>>>>> at a time, the handshakee could try to access the nwaiters 
>>>>>>>> field of an already destroyed semaphore when signaling it. In 
>>>>>>>> any case nobody was waiting on that semaphore (we were not 
>>>>>>>> using kernel functionality), so just using an atomic counter 
>>>>>>>> seems more appropriate.
>>>>>>>> In order to avoid issues due to disarming a JavaThread that 
>>>>>>>> should still be armed for a handshake or safepoint, each 
>>>>>>>> JavaThread will now always disarm its own polling page.
>>>>>>>> I also added a new test, HandshakeDirectTest.java, which tries 
>>>>>>>> to stress the use of direct handshakes with revocations.
>>>>>>>> In terms of performance, I measured no difference in the 
>>>>>>>> execution time of one individual handshake. The difference can 
>>>>>>>> be seen when several handshakes at a time are executed as 
>>>>>>>> expected. So for example on Linux running on an Intel Xeon 
>>>>>>>> 8167M cpu, test HandshakeDirectTest.java (which executes 50000 
>>>>>>>> handshakes between 32 threads) executes in around 340ms using 
>>>>>>>> direct handshakes and in around 5.6 seconds without it. For a 
>>>>>>>> modified version of that test that only executes 128 handshakes 
>>>>>>>> between the 32 threads and avoids any suspend-resume, the test 
>>>>>>>> takes around 12ms with direct handshakes and 19ms without it.
>>>>>>>> Tested with mach5, tiers1-6.
>>>>>>>>
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>>>>>>>> Webrev: 
>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>>>>>>>
>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>
>>>>>>>  36 // A handshake closure is a callback that is executed for 
>>>>>>> each JavaThread
>>>>>>>   37 // while that thread is in a safepoint safe state. The 
>>>>>>> callback is executed
>>>>>>>   38 // either by the thread itself or by the VM thread while 
>>>>>>> keeping the thread
>>>>>>>   39 // in a blocked state. A handshake can be performed with a 
>>>>>>> single
>>>>>>>   40 // JavaThread as well.
>>>>>>>
>>>>>>> Does this comment block need updating for the direct case?
>>>>>>>
>>>>>>>  61 // the operation is only done by either VM thread on behalf 
>>>>>>> of the JavaThread
>>>>>>>  62 // or the JavaThread itself.
>>>>>>>
>>>>>>> Again does this need an update?
>>>>>> Yes, I forgot to update those comments in handshake.hpp. Fixed.
>>>>>>
>>>>>>
>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>
>>>>>>>  41 class HandshakeOperation: public StackObj {
>>>>>>>
>>>>>>> Would it be clearer to have HandShakeOperation subclassed by 
>>>>>>> DirectHandShakeOperation, rather than it being a property of the 
>>>>>>> current op instance?
>>>>>> But I would still need to access the is_direct field for 
>>>>>> non-direct handshakes in HandshakeState::try_process() anyways.
>>>>>
>>>>> Okay.
>>>>>
>>>>>>
>>>>>>>  41 class HandshakeOperation: public StackObj {
>>>>>>>  42   HandshakeClosure* _handshake_cl;
>>>>>>>  43   int64_t _pending_threads;
>>>>>>>  ...
>>>>>>>  53   void add_target_count(int count) { 
>>>>>>> Atomic::add(&_pending_threads, count); }
>>>>>>>
>>>>>>> Can you clarify the lifecycle of this StackObj please. You 
>>>>>>> obviously expose it to other threads - hence the atomic update - 
>>>>>>> but that means it needs to be guaranteed to be live when that 
>>>>>>> update occurs.
>>>>>> The HandshakeOperation object is created in Handshake::execute() 
>>>>>> (or Handshake::execute_direct() now) by the Handshaker. The only 
>>>>>> other threads that could access this object are the VMThread(for 
>>>>>> non-direct case only) and the Handshakee(in both cases). Seeing 
>>>>>> that it's safe for the VMThread to decrement the counter is 
>>>>>> straightforward since the Handshaker will be waiting on the 
>>>>>> VMOperationRequest_lock waiting for the operation to finish (this 
>>>>>> is just as always has been).
>>>>>> As for the Handshakee, we can see that the HandshakeOperation 
>>>>>> object will be alive as long as the VMThread(for the non-direct 
>>>>>> case) or Handshaker(for the direct case) keep seeing that the 
>>>>>> operation is not done by calling 
>>>>>> HandshakeOperation::is_completed(). Once that returns true, for 
>>>>>> the non-direct case the VMThread will finish the operation and 
>>>>>> wake up the Handshaker who will return from Handshake::execute() 
>>>>>> and the object will be destroyed, and for the direct case the 
>>>>>> Handshaker will just return from execute_direct() and the object 
>>>>>> will be destroyed. Since is_completed() will only return true 
>>>>>> when _pending_threads reaches zero, that means the decrement 
>>>>>> operation had to be safe. Just as an observation, for the 
>>>>>> HandshakeAllThreads operation, _pending_threads could go to zero 
>>>>>> and even negative before executing add_target_count(), but that's 
>>>>>> okay because the VMThread doesn't call is_completed() before that 
>>>>>> and the load of _pending_threads cannot float above 
>>>>>> add_target_count() since the atomic add operation provides a 
>>>>>> memory fence.
>>>>>> And finally the Handshakee cannot try to execute an operation 
>>>>>> that has already being processed by the VMThread/Handshaker, 
>>>>>> because they clear it first and then signal the _processing_sem 
>>>>>> semaphore in HandshakeState::try_process(). That worked the same 
>>>>>> before this change.
>>>>>
>>>>> Thanks for the detailed explanation!
>>>>>
>>>>>>
>>>>>>> -- 
>>>>>>>
>>>>>>>  349     HandshakeOperation* op = 
>>>>>>> Atomic::load_acquire(&_operation);
>>>>>>>  350     if (op == NULL) {
>>>>>>>  351       op = Atomic::load_acquire(&_operation_direct);
>>>>>>>
>>>>>>> This gives preference to the non-direct op. Is it possible for 
>>>>>>> the direct-op to be indefinitely delayed if there is a series of 
>>>>>>> non-direct ops?
>>>>>> Yes, it's possible, although I think that's an unlikely scenario. 
>>>>>> That would mean the Handshakee would be stuck in that while loop 
>>>>>> with the VMThread constantly setting new operations in 
>>>>>> _operation. But I can modify the loop and make it try to execute 
>>>>>> both before going into the next interation, something like (more 
>>>>>> on load_acquire next):
>>>>>>
>>>>>>      if (has_operation()) {
>>>>>>        HandleMark hm(_thread);
>>>>>>        CautiouslyPreserveExceptionMark pem(_thread);
>>>>>>        HandshakeOperation * op = _operation;
>>>>>>        if (op != NULL) {
>>>>>>          // Disarm before execute the operation
>>>>>>          clear_handshake(false);
>>>>>>          op->do_handshake(_thread);
>>>>>>        }
>>>>>>        op = _operation_direct;
>>>>>>        if (op != NULL) {
>>>>>>          // Disarm before execute the operation
>>>>>>          clear_handshake(true);
>>>>>>          op->do_handshake(_thread);
>>>>>>        }
>>>>>>      }
>>>>>>
>>>>>> what do you think?
>>>>>
>>>>> That seems better to me.
>>>>>
>>>>>>
>>>>>>> Also I don't see the release_stores that these load_acquire 
>>>>>>> would pair with. I'm assuming it should be here:
>>>>>>>
>>>>>>>  319     _operation = op;
>>>>>>>  323     _operation_direct = op;
>>>>>>>
>>>>>>> But I would also have expected all necessary memory 
>>>>>>> synchronization to already be present via the semaphore 
>>>>>>> operations and/or the rest of the handshake mechanism. ??
>>>>>> Yes, the release is in 
>>>>>> SafepointMechanism::arm_local_poll_release() after setting those 
>>>>>> fields. But it's true that those load_acquire are not needed 
>>>>>> since the semaphore already has acquire semantics. I actually 
>>>>>> wanted to do a normal load but ended up just copying how we were 
>>>>>> loading _operation. I''ll change them and retest.
>>>>>
>>>>> Okay.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -------
>>>>>
>>>>>>
>>>>>>> 396 if ((!is_direct && _operation != NULL) || (is_direct && 
>>>>>>> _operation_direct != NULL)){
>>>>>>> 406   if ((!is_direct && _operation == NULL) || (is_direct && 
>>>>>>> _operation_direct == NULL)){
>>>>>>>
>>>>>>> Can this not be abstracted back into a "has_operation" method 
>>>>>>> that takes a "direct" parameter? e.g
>>>>>>>
>>>>>>>   bool has_operation() const { return _operation != NULL || 
>>>>>>> _operation_direct != NULL; }
>>>>>>> + bool has_specific_operation(bool direct) {
>>>>>>> +   return direct ? _operation_direct != NULL : _operation != NULL;
>>>>>>> + }
>>>>>> Done.
>>>>>>
>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>>>>>>>
>>>>>>> Please add:
>>>>>>>
>>>>>>> import java.util.concurrent.Semaphore;
>>>>>>>
>>>>>>> so you can just refer to Sempahore.
>>>>>> Done.
>>>>>>
>>>>>>
>>>>>>> Style nit: _working_threads
>>>>>>>
>>>>>>> Java style is to not use leading underscore and to use camelCase 
>>>>>>> for variables ie workingThreads.
>>>>>> Right, I changed them to camelCase.
>>>>>>
>>>>>>
>>>>>>> 44 static Thread _suspendresume_thread = new Thread();
>>>>>>>
>>>>>>> The above is dead code.
>>>>>> Removed.
>>>>>>
>>>>>>
>>>>>>> 53 if (_is_biased[me] == false) {
>>>>>>>
>>>>>>> Style nit: use "if (!_is_biased[me]) {"
>>>>>> Changed.
>>>>>>
>>>>>>
>>>>>>> 80             } catch(InterruptedException ie) {
>>>>>>>   81             }
>>>>>>>
>>>>>>> I suggest inserting "throw new Error("Unexpected interrupt");" 
>>>>>>> for good measure.
>>>>>>>
>>>>>>>  111 _working_threads[i].suspend();
>>>>>> Done.
>>>>>>
>>>>>>
>>>>>>> Thread suspension has been deprecated-for-removal as of JDK 14 
>>>>>>> so could be gone in JDK 15 (but more likely 16). If the 
>>>>>>> suspend/resume is important for this test then you will need to 
>>>>>>> switch to using JVM TI suspend/resume; or else perhaps introduce 
>>>>>>> a WhiteBox method to do whatever you need in the VM.
>>>>>> It's not really needed for this test. I just wanted to mixed 
>>>>>> suspend-resume with handshakes because I know there could be some 
>>>>>> buggy interactions between them. But since it has been deprecated 
>>>>>> then maybe there is no point in stressing handshakes with them 
>>>>>> and I can remove that part.
>>>>>>
>>>>>>
>>>>>>>  123 // Wait until the desired number of direct handshakes is 
>>>>>>> reached
>>>>>>>  124         while (_handshake_count.get() < 
>>>>>>> DIRECT_HANDSHAKES_MARK) {
>>>>>>>  125             Thread.sleep(10); // sleep for 10ms
>>>>>>>  126         }
>>>>>>>
>>>>>>> You could just do a join() on one of the worker threads.
>>>>>> Right, didn't thought about that. Changed.
>>>>>>
>>>>>>
>>>>>> Thanks for looking at this David! I'll retest with the above 
>>>>>> changes in the meantime.
>>>>>>
>>>>>>
>>>>>> Patricio
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Patricio
>>>>>>>>
>>>>>>
>>>>
>>
>