RFR 8230594: Allow direct handshakes without VMThread intervention

David Holmes david.holmes at oracle.com
Thu Jan 16 22:40:40 UTC 2020


Hi Dan,

Can I pick up on one suggestion you made:

     L341:   assert(Thread::current() == _thread, "should call from 
thread");
         This should probably be JavaThread::current().

If we get the wrong thread such that the assert fails, and that thread 
is not even a JavaThread, then we will get an assertion failure inside 
JavaThread::current() instead of failing the L341 assertion. So I 
suggest that Thread::current() is always the best method to use in an 
assertion involving thread identity.

Cheers,
David

On 17/01/2020 8:23 am, Daniel D. Daugherty wrote:
>> Here is v3:
>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/ 
> 
> src/hotspot/share/runtime/handshake.hpp
>      L34: class HandshakeOperation;
>          nit: please move before L33 for alpha sort order.
> 
>      L38: // either by the thread itself or by the VM thread while 
> keeping the thread
>          nit: s/VM thread/VMThread/
> 
>          Not your typo, but for consistency with your comment addition.
>          (Also, I prefer "VMThread" to "VM thread".)
> 
>      L41: // thread itself or, depending on whether the operation is a 
> direct handshake
>          Perhaps:
>              s/thread/target thread/
>          or:
>              s/thread/target JavaThread/
> 
>          since only JavaThreads can be the target of a handshake.
> 
>      L42: // or not, by the JavaThread that requested the handshake or 
> the VMThread
>          Can handshakes be requested by non-JavaThreads?
> 
>      Perhaps this for the comment block:
> 
>      L36: // A handshake closure is a callback that is executed for each 
> JavaThread
>      L37: // while that thread is in a safepoint safe state. The 
> callback is executed
>      L38: // either by the target JavaThread itself or by the VMThread 
> while keeping
>      L39: // the target thread in a blocked state. A handshake can be 
> performed with a
>      L40: // single JavaThread as well. In that case, the callback is 
> executed either
>      L41: // by the target JavaThread itself or, depending on whether 
> the operation is
>      L42: // a direct handshake or not, by the thread that requested the 
> handshake or
>      L43: // the VMThread respectively.
> 
> 
>      L65: // the JavaThread or the JavaThread itself.
>          nit: s/or the/or by the target/
> 
> src/hotspot/share/runtime/handshake.cpp
>      L174:     _op->add_target_count(number_of_threads_issued - 1);
>          Perhaps add a comment above the line:
>            // _op was created with a count == 1 so don't double count:
> 
>      L285:   JavaThread *self = (JavaThread*)Thread::current();
>      L325: 
> _handshake_turn_sem.wait_with_safepoint_check((JavaThread*)Thread::current()); 
> 
>          If the caller is supposed to be a JavaThread, then use
>          JavaThread::current() instead of casting.
> 
>      L286:   HandshakeOperation op(thread_cl, true);
>          Please prefix literal booleans with /*is_direct*/ so we know what
>          the parameter is.
> 
>      L341:   assert(Thread::current() == _thread, "should call from 
> thread");
>          This should probably be JavaThread::current().
> 
>      L356:         // Disarm before execute the operation
>      L362:         // Disarm before execute the operation
>          nit: s/execute/executing/
> 
>      L357:         clear_handshake(false);
>      L363:         clear_handshake(true);
>          Please prefix literal booleans with /*is_direct*/ so we know what
>          the parameter is.
> 
>      L430:   if ( (is_direct && op != _operation_direct)) {
>      L431:     _processing_sem.signal();
>          Why is the signal() call needed here (and not in the other
>          'return false' cases above)?
> 
> src/hotspot/share/runtime/biasedLocking.cpp
>      L671:   assert(Thread::current() == biased_locker ...
>          Can you add:
> 
>            Thread* cur = Thread::current();
> 
>          above this line? Then switch all the uses of "Thread::current()"
>          in this function to "cur".
> 
> src/hotspot/share/runtime/mutexLocker.cpp
>          L191 void assert_locked_or_safepoint_or_handshake(const Mutex* 
> lock, const JavaThread* thread) {
>      old L192:   if (Thread::current()->is_VM_thread() && 
> thread->is_vmthread_processing_handshake()) return;
>      new L192:   if (Thread::current() == 
> thread->get_active_handshaker()) return;
>          I had to really think about logical equivalence here.
>          I think I've convinced myself that this is okay...
> 
> src/hotspot/share/runtime/safepoint.cpp
>      No comments (see next file).
> 
> src/hotspot/share/runtime/safepointMechanism.inline.hpp
>      The reason for the deletion of
>      SafepointMechanism::disarm_if_needed() is not obvious.
> 
> src/hotspot/share/runtime/thread.hpp
>      No comments (nice encapsulation cleanup).
> 
> src/hotspot/share/runtime/thread.cpp
>      No comments (nice encapsulation cleanup).
> 
> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>      L107:         workingThreads[0].join();
>          By only join()ing with the 0th worker thread, you won't
>          detect if any of the worker threads is stuck due to a bug.
> 
> Dan
> 
> 
> On 1/15/20 10:56 PM, Patricio Chilano wrote:
>> Hi David,
>>
>> On 1/15/20 8:29 PM, David Holmes wrote:
>>> Hi Patricio,
>>>
>>> On 16/01/2020 6:06 am, Patricio Chilano wrote:
>>>> Hi David,
>>>>
>>>> Here is v2:
>>>>
>>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/webrev/
>>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/inc/webrev/
>>>
>>> src/hotspot/share/runtime/handshake.cpp
>>>
>>> why is it that execute_direct doesn't check for 
>>> uses_thread_local_poll() ?
>> Sorry, I thought for a moment that thread-local handshakes for arm32 
>> was already implemented and all platforms where using it. I added that 
>> check back, and by testing that path I had to also add back the 
>> VMThread in the assert in walk_stack_and_revoke().
>>
>>> src/hotspot/share/runtime/handshake.hpp
>>>
>>> typo: wether -> whether
>> Fixed.
>>
>> Here is v3:
>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/
>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/inc/webrev/
>>
>> Thanks!
>>
>> Patricio
>>> Otherwise updates look fine.
>>>
>>> Thanks,
>>> David
>>>
>>>> Passed one round of t1-6, and running more rounds now.
>>>>
>>>> Thanks,
>>>> Patricio
>>>>
>>>> On 1/14/20 7:57 PM, David Holmes wrote:
>>>>> Hi Patricio,
>>>>>
>>>>> On 15/01/2020 5:17 am, Patricio Chilano wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 1/14/20 4:13 AM, David Holmes wrote:
>>>>>>> Hi Patricio,
>>>>>>>
>>>>>>> Thanks for tackling this. I've taken an initial look through 
>>>>>>> trying to understand the general approach, but I don't claim to 
>>>>>>> understand all the subtle details, so there are some queries 
>>>>>>> below to help with my overall understanding.
>>>>>>>
>>>>>>> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> The following patch adds the ability to execute direct 
>>>>>>>> handshakes between JavaThreads without the VMThread 
>>>>>>>> intervention, and enables this functionality for biased locking 
>>>>>>>> revocations.
>>>>>>>> The current handshake mechanism that uses the VMThread, either 
>>>>>>>> to handshake one JavaThread or all of them, is still the default 
>>>>>>>> unless you specify otherwise when calling Handshake::execute().
>>>>>>>
>>>>>>> Can I suggest that rather than using an overloaded:
>>>>>>>
>>>>>>>  static bool execute(HandshakeClosure* hs_cl, JavaThread* target, 
>>>>>>> bool is_direct_handshake = false);
>>>>>>>
>>>>>>> you instead add an execute_direct method for clarity?
>>>>>> Ok, added execute_direct() and removed overloading.
>>>>>>
>>>>>>>> In order to avoid adding additional overhead to this path that 
>>>>>>>> uses the VMThread (especially the one that handshakes all 
>>>>>>>> JavaThreads) I added a new HandshakeOperation pointer in the 
>>>>>>>> HandshakeState class, _operation_direct, to be used for the 
>>>>>>>> direct handshake cases only and whose access is serialized 
>>>>>>>> between JavaThreads by using a semaphore. Thus, one direct 
>>>>>>>> handshake will be allowed at any given time, and upon completion 
>>>>>>>> the semaphore will be signaled to allow the next handshaker if 
>>>>>>>> any to proceed. In this way the old _operation can still be used 
>>>>>>>> only by the VMThread without the need for synchronization to 
>>>>>>>> access it. The handshakee will now check if any of _operation or 
>>>>>>>> _operation_direct is set when checking for a pending handshake 
>>>>>>>> and will try to execute both in 
>>>>>>>> HandshakeState::process_self_inner(). The execution of the 
>>>>>>>> handshake’s ThreadClosure, either direct handshake or not, is 
>>>>>>>> still protected by a semaphore, which I renamed to _processing_sem.
>>>>>>>> I converted the semaphore _done in HandshakeOperation to be just 
>>>>>>>> an atomic counter because of bug 
>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which I 
>>>>>>>> actually hit once!). Since the semaphore could not be static 
>>>>>>>> anymore due to possibly having more than one HandshakeOperation 
>>>>>>>> at a time, the handshakee could try to access the nwaiters field 
>>>>>>>> of an already destroyed semaphore when signaling it. In any case 
>>>>>>>> nobody was waiting on that semaphore (we were not using kernel 
>>>>>>>> functionality), so just using an atomic counter seems more 
>>>>>>>> appropriate.
>>>>>>>> In order to avoid issues due to disarming a JavaThread that 
>>>>>>>> should still be armed for a handshake or safepoint, each 
>>>>>>>> JavaThread will now always disarm its own polling page.
>>>>>>>> I also added a new test, HandshakeDirectTest.java, which tries 
>>>>>>>> to stress the use of direct handshakes with revocations.
>>>>>>>> In terms of performance, I measured no difference in the 
>>>>>>>> execution time of one individual handshake. The difference can 
>>>>>>>> be seen when several handshakes at a time are executed as 
>>>>>>>> expected. So for example on Linux running on an Intel Xeon 8167M 
>>>>>>>> cpu, test HandshakeDirectTest.java (which executes 50000 
>>>>>>>> handshakes between 32 threads) executes in around 340ms using 
>>>>>>>> direct handshakes and in around 5.6 seconds without it. For a 
>>>>>>>> modified version of that test that only executes 128 handshakes 
>>>>>>>> between the 32 threads and avoids any suspend-resume, the test 
>>>>>>>> takes around 12ms with direct handshakes and 19ms without it.
>>>>>>>> Tested with mach5, tiers1-6.
>>>>>>>>
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>>>>>>>> Webrev: 
>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>>>>>>>
>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>
>>>>>>>  36 // A handshake closure is a callback that is executed for 
>>>>>>> each JavaThread
>>>>>>>   37 // while that thread is in a safepoint safe state. The 
>>>>>>> callback is executed
>>>>>>>   38 // either by the thread itself or by the VM thread while 
>>>>>>> keeping the thread
>>>>>>>   39 // in a blocked state. A handshake can be performed with a 
>>>>>>> single
>>>>>>>   40 // JavaThread as well.
>>>>>>>
>>>>>>> Does this comment block need updating for the direct case?
>>>>>>>
>>>>>>>  61 // the operation is only done by either VM thread on behalf 
>>>>>>> of the JavaThread
>>>>>>>  62 // or the JavaThread itself.
>>>>>>>
>>>>>>> Again does this need an update?
>>>>>> Yes, I forgot to update those comments in handshake.hpp. Fixed.
>>>>>>
>>>>>>
>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>
>>>>>>>  41 class HandshakeOperation: public StackObj {
>>>>>>>
>>>>>>> Would it be clearer to have HandShakeOperation subclassed by 
>>>>>>> DirectHandShakeOperation, rather than it being a property of the 
>>>>>>> current op instance?
>>>>>> But I would still need to access the is_direct field for 
>>>>>> non-direct handshakes in HandshakeState::try_process() anyways.
>>>>>
>>>>> Okay.
>>>>>
>>>>>>
>>>>>>>  41 class HandshakeOperation: public StackObj {
>>>>>>>  42   HandshakeClosure* _handshake_cl;
>>>>>>>  43   int64_t _pending_threads;
>>>>>>>  ...
>>>>>>>  53   void add_target_count(int count) { 
>>>>>>> Atomic::add(&_pending_threads, count); }
>>>>>>>
>>>>>>> Can you clarify the lifecycle of this StackObj please. You 
>>>>>>> obviously expose it to other threads - hence the atomic update - 
>>>>>>> but that means it needs to be guaranteed to be live when that 
>>>>>>> update occurs.
>>>>>> The HandshakeOperation object is created in Handshake::execute() 
>>>>>> (or Handshake::execute_direct() now) by the Handshaker. The only 
>>>>>> other threads that could access this object are the VMThread(for 
>>>>>> non-direct case only) and the Handshakee(in both cases). Seeing 
>>>>>> that it's safe for the VMThread to decrement the counter is 
>>>>>> straightforward since the Handshaker will be waiting on the 
>>>>>> VMOperationRequest_lock waiting for the operation to finish (this 
>>>>>> is just as always has been).
>>>>>> As for the Handshakee, we can see that the HandshakeOperation 
>>>>>> object will be alive as long as the VMThread(for the non-direct 
>>>>>> case) or Handshaker(for the direct case) keep seeing that the 
>>>>>> operation is not done by calling 
>>>>>> HandshakeOperation::is_completed(). Once that returns true, for 
>>>>>> the non-direct case the VMThread will finish the operation and 
>>>>>> wake up the Handshaker who will return from Handshake::execute() 
>>>>>> and the object will be destroyed, and for the direct case the 
>>>>>> Handshaker will just return from execute_direct() and the object 
>>>>>> will be destroyed. Since is_completed() will only return true when 
>>>>>> _pending_threads reaches zero, that means the decrement operation 
>>>>>> had to be safe. Just as an observation, for the 
>>>>>> HandshakeAllThreads operation, _pending_threads could go to zero 
>>>>>> and even negative before executing add_target_count(), but that's 
>>>>>> okay because the VMThread doesn't call is_completed() before that 
>>>>>> and the load of _pending_threads cannot float above 
>>>>>> add_target_count() since the atomic add operation provides a 
>>>>>> memory fence.
>>>>>> And finally the Handshakee cannot try to execute an operation that 
>>>>>> has already being processed by the VMThread/Handshaker, because 
>>>>>> they clear it first and then signal the _processing_sem semaphore 
>>>>>> in HandshakeState::try_process(). That worked the same before this 
>>>>>> change.
>>>>>
>>>>> Thanks for the detailed explanation!
>>>>>
>>>>>>
>>>>>>> -- 
>>>>>>>
>>>>>>>  349     HandshakeOperation* op = Atomic::load_acquire(&_operation);
>>>>>>>  350     if (op == NULL) {
>>>>>>>  351       op = Atomic::load_acquire(&_operation_direct);
>>>>>>>
>>>>>>> This gives preference to the non-direct op. Is it possible for 
>>>>>>> the direct-op to be indefinitely delayed if there is a series of 
>>>>>>> non-direct ops?
>>>>>> Yes, it's possible, although I think that's an unlikely scenario. 
>>>>>> That would mean the Handshakee would be stuck in that while loop 
>>>>>> with the VMThread constantly setting new operations in _operation. 
>>>>>> But I can modify the loop and make it try to execute both before 
>>>>>> going into the next interation, something like (more on 
>>>>>> load_acquire next):
>>>>>>
>>>>>>      if (has_operation()) {
>>>>>>        HandleMark hm(_thread);
>>>>>>        CautiouslyPreserveExceptionMark pem(_thread);
>>>>>>        HandshakeOperation * op = _operation;
>>>>>>        if (op != NULL) {
>>>>>>          // Disarm before execute the operation
>>>>>>          clear_handshake(false);
>>>>>>          op->do_handshake(_thread);
>>>>>>        }
>>>>>>        op = _operation_direct;
>>>>>>        if (op != NULL) {
>>>>>>          // Disarm before execute the operation
>>>>>>          clear_handshake(true);
>>>>>>          op->do_handshake(_thread);
>>>>>>        }
>>>>>>      }
>>>>>>
>>>>>> what do you think?
>>>>>
>>>>> That seems better to me.
>>>>>
>>>>>>
>>>>>>> Also I don't see the release_stores that these load_acquire would 
>>>>>>> pair with. I'm assuming it should be here:
>>>>>>>
>>>>>>>  319     _operation = op;
>>>>>>>  323     _operation_direct = op;
>>>>>>>
>>>>>>> But I would also have expected all necessary memory 
>>>>>>> synchronization to already be present via the semaphore 
>>>>>>> operations and/or the rest of the handshake mechanism. ??
>>>>>> Yes, the release is in 
>>>>>> SafepointMechanism::arm_local_poll_release() after setting those 
>>>>>> fields. But it's true that those load_acquire are not needed since 
>>>>>> the semaphore already has acquire semantics. I actually wanted to 
>>>>>> do a normal load but ended up just copying how we were loading 
>>>>>> _operation. I''ll change them and retest.
>>>>>
>>>>> Okay.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -------
>>>>>
>>>>>>
>>>>>>> 396 if ((!is_direct && _operation != NULL) || (is_direct && 
>>>>>>> _operation_direct != NULL)){
>>>>>>> 406   if ((!is_direct && _operation == NULL) || (is_direct && 
>>>>>>> _operation_direct == NULL)){
>>>>>>>
>>>>>>> Can this not be abstracted back into a "has_operation" method 
>>>>>>> that takes a "direct" parameter? e.g
>>>>>>>
>>>>>>>   bool has_operation() const { return _operation != NULL || 
>>>>>>> _operation_direct != NULL; }
>>>>>>> + bool has_specific_operation(bool direct) {
>>>>>>> +   return direct ? _operation_direct != NULL : _operation != NULL;
>>>>>>> + }
>>>>>> Done.
>>>>>>
>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>>>>>>>
>>>>>>> Please add:
>>>>>>>
>>>>>>> import java.util.concurrent.Semaphore;
>>>>>>>
>>>>>>> so you can just refer to Sempahore.
>>>>>> Done.
>>>>>>
>>>>>>
>>>>>>> Style nit: _working_threads
>>>>>>>
>>>>>>> Java style is to not use leading underscore and to use camelCase 
>>>>>>> for variables ie workingThreads.
>>>>>> Right, I changed them to camelCase.
>>>>>>
>>>>>>
>>>>>>> 44 static Thread _suspendresume_thread = new Thread();
>>>>>>>
>>>>>>> The above is dead code.
>>>>>> Removed.
>>>>>>
>>>>>>
>>>>>>> 53 if (_is_biased[me] == false) {
>>>>>>>
>>>>>>> Style nit: use "if (!_is_biased[me]) {"
>>>>>> Changed.
>>>>>>
>>>>>>
>>>>>>> 80             } catch(InterruptedException ie) {
>>>>>>>   81             }
>>>>>>>
>>>>>>> I suggest inserting "throw new Error("Unexpected interrupt");" 
>>>>>>> for good measure.
>>>>>>>
>>>>>>>  111                     _working_threads[i].suspend();
>>>>>> Done.
>>>>>>
>>>>>>
>>>>>>> Thread suspension has been deprecated-for-removal as of JDK 14 so 
>>>>>>> could be gone in JDK 15 (but more likely 16). If the 
>>>>>>> suspend/resume is important for this test then you will need to 
>>>>>>> switch to using JVM TI suspend/resume; or else perhaps introduce 
>>>>>>> a WhiteBox method to do whatever you need in the VM.
>>>>>> It's not really needed for this test. I just wanted to mixed 
>>>>>> suspend-resume with handshakes because I know there could be some 
>>>>>> buggy interactions between them. But since it has been deprecated 
>>>>>> then maybe there is no point in stressing handshakes with them and 
>>>>>> I can remove that part.
>>>>>>
>>>>>>
>>>>>>>  123 // Wait until the desired number of direct handshakes is 
>>>>>>> reached
>>>>>>>  124         while (_handshake_count.get() < 
>>>>>>> DIRECT_HANDSHAKES_MARK) {
>>>>>>>  125             Thread.sleep(10); // sleep for 10ms
>>>>>>>  126         }
>>>>>>>
>>>>>>> You could just do a join() on one of the worker threads.
>>>>>> Right, didn't thought about that. Changed.
>>>>>>
>>>>>>
>>>>>> Thanks for looking at this David! I'll retest with the above 
>>>>>> changes in the meantime.
>>>>>>
>>>>>>
>>>>>> Patricio
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Patricio
>>>>>>>>
>>>>>>
>>>>
>>
> 


More information about the hotspot-runtime-dev mailing list