RFR 8230594: Allow direct handshakes without VMThread intervention
David Holmes
david.holmes at oracle.com
Fri Jan 17 23:17:25 UTC 2020
Hi Patricio,
Your v4 reintroduced the suspend/resume in the test! ??
Cheers,
David
On 18/01/2020 9:06 am, Patricio Chilano wrote:
> Hi Dan,
>
> On 1/16/20 7:23 PM, Daniel D. Daugherty wrote:
>>> Here is v3:
>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/
>>
>> src/hotspot/share/runtime/handshake.hpp
>> L34: class HandshakeOperation;
>> nit: please move before L33 for alpha sort order.
> Done.
>
>> L38: // either by the thread itself or by the VM thread while keeping
>> the thread
>> nit: s/VM thread/VMThread/
>>
>> Not your typo, but for consistency with your comment addition.
>> (Also, I prefer "VMThread" to "VM thread".)
> Done.
>
>> L41: // thread itself or, depending on whether the operation is a
>> direct handshake
>> Perhaps:
>> s/thread/target thread/
>> or:
>> s/thread/target JavaThread/
>>
>> since only JavaThreads can be the target of a handshake.
> Done.
>
>> L42: // or not, by the JavaThread that requested the handshake or the
>> VMThread
>> Can handshakes be requested by non-JavaThreads?
> I wrote the code with the assumption that direct handshakes will be
> between JavaThreads only, since at least for now this is going to be
> used only for biased locking revocations. Changing it to allow
> non-JavaThreads to request direct handshakes should be straightforward
> though. We just wouldn't need to call SafepointMechanism::should_block()
> to check for pending handshakes and we wouldn't need to wait with
> safepoint check in set_operation(). I wouldn't be able to test it though
> since we don't have those kind of direct handshakes yet. Maybe we can
> change it later when we have a real case?
>
>> Perhaps this for the comment block:
>>
>> L36: // A handshake closure is a callback that is executed for
>> each JavaThread
>> L37: // while that thread is in a safepoint safe state. The
>> callback is executed
>> L38: // either by the target JavaThread itself or by the VMThread
>> while keeping
>> L39: // the target thread in a blocked state. A handshake can be
>> performed with a
>> L40: // single JavaThread as well. In that case, the callback is
>> executed either
>> L41: // by the target JavaThread itself or, depending on whether
>> the operation is
>> L42: // a direct handshake or not, by the thread that requested
>> the handshake or
>> L43: // the VMThread respectively.
> Done. If you are okay with the previous point I changed the last
> "thread" to "JavaThread".
>
>> L65: // the JavaThread or the JavaThread itself.
>> nit: s/or the/or by the target/
> Done.
>
>> src/hotspot/share/runtime/handshake.cpp
>> L174: _op->add_target_count(number_of_threads_issued - 1);
>> Perhaps add a comment above the line:
>> // _op was created with a count == 1 so don't double count:
> Done.
>
>> L285: JavaThread *self = (JavaThread*)Thread::current();
>> L325:
>> _handshake_turn_sem.wait_with_safepoint_check((JavaThread*)Thread::current());
>>
>> If the caller is supposed to be a JavaThread, then use
>> JavaThread::current() instead of casting.
> Done.
>
>> L286: HandshakeOperation op(thread_cl, true);
>> Please prefix literal booleans with /*is_direct*/ so we know what
>> the parameter is.
> Done.
>
>> L341: assert(Thread::current() == _thread, "should call from thread");
>> This should probably be JavaThread::current().
> I read the discussion you had with David and I think you both agree on
> keeping Thread::current() so I didn't change it.
>
>> L356: // Disarm before execute the operation
>> L362: // Disarm before execute the operation
>> nit: s/execute/executing/
> Done.
>
>> L357: clear_handshake(false);
>> L363: clear_handshake(true);
>> Please prefix literal booleans with /*is_direct*/ so we know what
>> the parameter is.
> Done.
>
>> L430: if ( (is_direct && op != _operation_direct)) {
>> L431: _processing_sem.signal();
>> Why is the signal() call needed here (and not in the other
>> 'return false' cases above)?
> Because we only successfully decremented the semaphore if
> claim_handshake() returns true. In all the previous "return false" cases
> we either did not try to decrement it or we failed to do so.
>
>> src/hotspot/share/runtime/biasedLocking.cpp
>> L671: assert(Thread::current() == biased_locker ...
>> Can you add:
>>
>> Thread* cur = Thread::current();
>>
>> above this line? Then switch all the uses of "Thread::current()"
>> in this function to "cur".
> Done.
>
>> src/hotspot/share/runtime/mutexLocker.cpp
>> L191 void assert_locked_or_safepoint_or_handshake(const Mutex*
>> lock, const JavaThread* thread) {
>> old L192: if (Thread::current()->is_VM_thread() &&
>> thread->is_vmthread_processing_handshake()) return;
>> new L192: if (Thread::current() ==
>> thread->get_active_handshaker()) return;
>> I had to really think about logical equivalence here.
>> I think I've convinced myself that this is okay...
> Without this patch, only the VMThread could be processing a handshake
> closure for a given JavaThread. But now it can also be executed by a
> JavaThread. Hence I changed the code to check whether the current thread
> is the one executing the handshake closure on behalf of the JavaThread
> passed as a parameter to assert_locked_or_safepoint_or_handshake().
>
>> src/hotspot/share/runtime/safepoint.cpp
>> No comments (see next file).
>>
>> src/hotspot/share/runtime/safepointMechanism.inline.hpp
>> The reason for the deletion of
>> SafepointMechanism::disarm_if_needed() is not obvious.
> Before, only the VMThread would arm a JavaThread, either for a safepoint
> or a handshake. The disarm could be made by either of them. The disarm
> by the VMThread was always safe, since it would do it after the
> operation was completed. For the JavaThread not quite. To avoid
> disarming a still pending safepoint/handshake the JavaThread immediately
> rechecks if there is a pending operation after doing the disarm, and in
> case there is, it arms itself again (that code is in
> SafepointMechanism::block_if_requested_slow()).
> If we now allow JavaThreads doing direct handshakes or the VMThread to
> disarm the polling page of a given JavaThread then we would need to
> protect access to the polling page with some synchronization primitive
> to avoid disarming a current pending operation (VMThread disarming when
> there is still a pending direct handshake, or JavaThread disarming when
> there is still a pending safepoint/handshake). To avoid adding
> additional overhead in accessing the polling page we can make each
> JavaThread disarm itself.
>
>> src/hotspot/share/runtime/thread.hpp
>> No comments (nice encapsulation cleanup).
>>
>> src/hotspot/share/runtime/thread.cpp
>> No comments (nice encapsulation cleanup).
>>
>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>> L107: workingThreads[0].join();
>> By only join()ing with the 0th worker thread, you won't
>> detect if any of the worker threads is stuck due to a bug.
> Right, I changed it and now I do join() with all the workers.
>
> Here is v4 addressing also Coleen's comments. Hope I didn't miss anything:
>
> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v04/webrev/
> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v04/inc/webrev/
>
> Thanks!
>
> Patricio
>> Dan
>>
>>
>> On 1/15/20 10:56 PM, Patricio Chilano wrote:
>>> Hi David,
>>>
>>> On 1/15/20 8:29 PM, David Holmes wrote:
>>>> Hi Patricio,
>>>>
>>>> On 16/01/2020 6:06 am, Patricio Chilano wrote:
>>>>> Hi David,
>>>>>
>>>>> Here is v2:
>>>>>
>>>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/webrev/
>>>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/inc/webrev/
>>>>
>>>> src/hotspot/share/runtime/handshake.cpp
>>>>
>>>> why is it that execute_direct doesn't check for
>>>> uses_thread_local_poll() ?
>>> Sorry, I thought for a moment that thread-local handshakes for arm32
>>> was already implemented and all platforms where using it. I added
>>> that check back, and by testing that path I had to also add back the
>>> VMThread in the assert in walk_stack_and_revoke().
>>>
>>>> src/hotspot/share/runtime/handshake.hpp
>>>>
>>>> typo: wether -> whether
>>> Fixed.
>>>
>>> Here is v3:
>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/
>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/inc/webrev/
>>>
>>> Thanks!
>>>
>>> Patricio
>>>> Otherwise updates look fine.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Passed one round of t1-6, and running more rounds now.
>>>>>
>>>>> Thanks,
>>>>> Patricio
>>>>>
>>>>> On 1/14/20 7:57 PM, David Holmes wrote:
>>>>>> Hi Patricio,
>>>>>>
>>>>>> On 15/01/2020 5:17 am, Patricio Chilano wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> On 1/14/20 4:13 AM, David Holmes wrote:
>>>>>>>> Hi Patricio,
>>>>>>>>
>>>>>>>> Thanks for tackling this. I've taken an initial look through
>>>>>>>> trying to understand the general approach, but I don't claim to
>>>>>>>> understand all the subtle details, so there are some queries
>>>>>>>> below to help with my overall understanding.
>>>>>>>>
>>>>>>>> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> The following patch adds the ability to execute direct
>>>>>>>>> handshakes between JavaThreads without the VMThread
>>>>>>>>> intervention, and enables this functionality for biased locking
>>>>>>>>> revocations.
>>>>>>>>> The current handshake mechanism that uses the VMThread, either
>>>>>>>>> to handshake one JavaThread or all of them, is still the
>>>>>>>>> default unless you specify otherwise when calling
>>>>>>>>> Handshake::execute().
>>>>>>>>
>>>>>>>> Can I suggest that rather than using an overloaded:
>>>>>>>>
>>>>>>>> static bool execute(HandshakeClosure* hs_cl, JavaThread*
>>>>>>>> target, bool is_direct_handshake = false);
>>>>>>>>
>>>>>>>> you instead add an execute_direct method for clarity?
>>>>>>> Ok, added execute_direct() and removed overloading.
>>>>>>>
>>>>>>>>> In order to avoid adding additional overhead to this path that
>>>>>>>>> uses the VMThread (especially the one that handshakes all
>>>>>>>>> JavaThreads) I added a new HandshakeOperation pointer in the
>>>>>>>>> HandshakeState class, _operation_direct, to be used for the
>>>>>>>>> direct handshake cases only and whose access is serialized
>>>>>>>>> between JavaThreads by using a semaphore. Thus, one direct
>>>>>>>>> handshake will be allowed at any given time, and upon
>>>>>>>>> completion the semaphore will be signaled to allow the next
>>>>>>>>> handshaker if any to proceed. In this way the old _operation
>>>>>>>>> can still be used only by the VMThread without the need for
>>>>>>>>> synchronization to access it. The handshakee will now check if
>>>>>>>>> any of _operation or _operation_direct is set when checking for
>>>>>>>>> a pending handshake and will try to execute both in
>>>>>>>>> HandshakeState::process_self_inner(). The execution of the
>>>>>>>>> handshake’s ThreadClosure, either direct handshake or not, is
>>>>>>>>> still protected by a semaphore, which I renamed to
>>>>>>>>> _processing_sem.
>>>>>>>>> I converted the semaphore _done in HandshakeOperation to be
>>>>>>>>> just an atomic counter because of bug
>>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which I
>>>>>>>>> actually hit once!). Since the semaphore could not be static
>>>>>>>>> anymore due to possibly having more than one HandshakeOperation
>>>>>>>>> at a time, the handshakee could try to access the nwaiters
>>>>>>>>> field of an already destroyed semaphore when signaling it. In
>>>>>>>>> any case nobody was waiting on that semaphore (we were not
>>>>>>>>> using kernel functionality), so just using an atomic counter
>>>>>>>>> seems more appropriate.
>>>>>>>>> In order to avoid issues due to disarming a JavaThread that
>>>>>>>>> should still be armed for a handshake or safepoint, each
>>>>>>>>> JavaThread will now always disarm its own polling page.
>>>>>>>>> I also added a new test, HandshakeDirectTest.java, which tries
>>>>>>>>> to stress the use of direct handshakes with revocations.
>>>>>>>>> In terms of performance, I measured no difference in the
>>>>>>>>> execution time of one individual handshake. The difference can
>>>>>>>>> be seen when several handshakes at a time are executed as
>>>>>>>>> expected. So for example on Linux running on an Intel Xeon
>>>>>>>>> 8167M cpu, test HandshakeDirectTest.java (which executes 50000
>>>>>>>>> handshakes between 32 threads) executes in around 340ms using
>>>>>>>>> direct handshakes and in around 5.6 seconds without it. For a
>>>>>>>>> modified version of that test that only executes 128 handshakes
>>>>>>>>> between the 32 threads and avoids any suspend-resume, the test
>>>>>>>>> takes around 12ms with direct handshakes and 19ms without it.
>>>>>>>>> Tested with mach5, tiers1-6.
>>>>>>>>>
>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>>>>>>>>> Webrev:
>>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>>
>>>>>>>> 36 // A handshake closure is a callback that is executed for
>>>>>>>> each JavaThread
>>>>>>>> 37 // while that thread is in a safepoint safe state. The
>>>>>>>> callback is executed
>>>>>>>> 38 // either by the thread itself or by the VM thread while
>>>>>>>> keeping the thread
>>>>>>>> 39 // in a blocked state. A handshake can be performed with a
>>>>>>>> single
>>>>>>>> 40 // JavaThread as well.
>>>>>>>>
>>>>>>>> Does this comment block need updating for the direct case?
>>>>>>>>
>>>>>>>> 61 // the operation is only done by either VM thread on behalf
>>>>>>>> of the JavaThread
>>>>>>>> 62 // or the JavaThread itself.
>>>>>>>>
>>>>>>>> Again does this need an update?
>>>>>>> Yes, I forgot to update those comments in handshake.hpp. Fixed.
>>>>>>>
>>>>>>>
>>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>>
>>>>>>>> 41 class HandshakeOperation: public StackObj {
>>>>>>>>
>>>>>>>> Would it be clearer to have HandShakeOperation subclassed by
>>>>>>>> DirectHandShakeOperation, rather than it being a property of the
>>>>>>>> current op instance?
>>>>>>> But I would still need to access the is_direct field for
>>>>>>> non-direct handshakes in HandshakeState::try_process() anyways.
>>>>>>
>>>>>> Okay.
>>>>>>
>>>>>>>
>>>>>>>> 41 class HandshakeOperation: public StackObj {
>>>>>>>> 42 HandshakeClosure* _handshake_cl;
>>>>>>>> 43 int64_t _pending_threads;
>>>>>>>> ...
>>>>>>>> 53 void add_target_count(int count) {
>>>>>>>> Atomic::add(&_pending_threads, count); }
>>>>>>>>
>>>>>>>> Can you clarify the lifecycle of this StackObj please. You
>>>>>>>> obviously expose it to other threads - hence the atomic update -
>>>>>>>> but that means it needs to be guaranteed to be live when that
>>>>>>>> update occurs.
>>>>>>> The HandshakeOperation object is created in Handshake::execute()
>>>>>>> (or Handshake::execute_direct() now) by the Handshaker. The only
>>>>>>> other threads that could access this object are the VMThread(for
>>>>>>> non-direct case only) and the Handshakee(in both cases). Seeing
>>>>>>> that it's safe for the VMThread to decrement the counter is
>>>>>>> straightforward since the Handshaker will be waiting on the
>>>>>>> VMOperationRequest_lock waiting for the operation to finish (this
>>>>>>> is just as always has been).
>>>>>>> As for the Handshakee, we can see that the HandshakeOperation
>>>>>>> object will be alive as long as the VMThread(for the non-direct
>>>>>>> case) or Handshaker(for the direct case) keep seeing that the
>>>>>>> operation is not done by calling
>>>>>>> HandshakeOperation::is_completed(). Once that returns true, for
>>>>>>> the non-direct case the VMThread will finish the operation and
>>>>>>> wake up the Handshaker who will return from Handshake::execute()
>>>>>>> and the object will be destroyed, and for the direct case the
>>>>>>> Handshaker will just return from execute_direct() and the object
>>>>>>> will be destroyed. Since is_completed() will only return true
>>>>>>> when _pending_threads reaches zero, that means the decrement
>>>>>>> operation had to be safe. Just as an observation, for the
>>>>>>> HandshakeAllThreads operation, _pending_threads could go to zero
>>>>>>> and even negative before executing add_target_count(), but that's
>>>>>>> okay because the VMThread doesn't call is_completed() before that
>>>>>>> and the load of _pending_threads cannot float above
>>>>>>> add_target_count() since the atomic add operation provides a
>>>>>>> memory fence.
>>>>>>> And finally the Handshakee cannot try to execute an operation
>>>>>>> that has already being processed by the VMThread/Handshaker,
>>>>>>> because they clear it first and then signal the _processing_sem
>>>>>>> semaphore in HandshakeState::try_process(). That worked the same
>>>>>>> before this change.
>>>>>>
>>>>>> Thanks for the detailed explanation!
>>>>>>
>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> 349 HandshakeOperation* op =
>>>>>>>> Atomic::load_acquire(&_operation);
>>>>>>>> 350 if (op == NULL) {
>>>>>>>> 351 op = Atomic::load_acquire(&_operation_direct);
>>>>>>>>
>>>>>>>> This gives preference to the non-direct op. Is it possible for
>>>>>>>> the direct-op to be indefinitely delayed if there is a series of
>>>>>>>> non-direct ops?
>>>>>>> Yes, it's possible, although I think that's an unlikely scenario.
>>>>>>> That would mean the Handshakee would be stuck in that while loop
>>>>>>> with the VMThread constantly setting new operations in
>>>>>>> _operation. But I can modify the loop and make it try to execute
>>>>>>> both before going into the next interation, something like (more
>>>>>>> on load_acquire next):
>>>>>>>
>>>>>>> if (has_operation()) {
>>>>>>> HandleMark hm(_thread);
>>>>>>> CautiouslyPreserveExceptionMark pem(_thread);
>>>>>>> HandshakeOperation * op = _operation;
>>>>>>> if (op != NULL) {
>>>>>>> // Disarm before execute the operation
>>>>>>> clear_handshake(false);
>>>>>>> op->do_handshake(_thread);
>>>>>>> }
>>>>>>> op = _operation_direct;
>>>>>>> if (op != NULL) {
>>>>>>> // Disarm before execute the operation
>>>>>>> clear_handshake(true);
>>>>>>> op->do_handshake(_thread);
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> what do you think?
>>>>>>
>>>>>> That seems better to me.
>>>>>>
>>>>>>>
>>>>>>>> Also I don't see the release_stores that these load_acquire
>>>>>>>> would pair with. I'm assuming it should be here:
>>>>>>>>
>>>>>>>> 319 _operation = op;
>>>>>>>> 323 _operation_direct = op;
>>>>>>>>
>>>>>>>> But I would also have expected all necessary memory
>>>>>>>> synchronization to already be present via the semaphore
>>>>>>>> operations and/or the rest of the handshake mechanism. ??
>>>>>>> Yes, the release is in
>>>>>>> SafepointMechanism::arm_local_poll_release() after setting those
>>>>>>> fields. But it's true that those load_acquire are not needed
>>>>>>> since the semaphore already has acquire semantics. I actually
>>>>>>> wanted to do a normal load but ended up just copying how we were
>>>>>>> loading _operation. I''ll change them and retest.
>>>>>>
>>>>>> Okay.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -------
>>>>>>
>>>>>>>
>>>>>>>> 396 if ((!is_direct && _operation != NULL) || (is_direct &&
>>>>>>>> _operation_direct != NULL)){
>>>>>>>> 406 if ((!is_direct && _operation == NULL) || (is_direct &&
>>>>>>>> _operation_direct == NULL)){
>>>>>>>>
>>>>>>>> Can this not be abstracted back into a "has_operation" method
>>>>>>>> that takes a "direct" parameter? e.g
>>>>>>>>
>>>>>>>> bool has_operation() const { return _operation != NULL ||
>>>>>>>> _operation_direct != NULL; }
>>>>>>>> + bool has_specific_operation(bool direct) {
>>>>>>>> + return direct ? _operation_direct != NULL : _operation != NULL;
>>>>>>>> + }
>>>>>>> Done.
>>>>>>>
>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>>>>>>>>
>>>>>>>> Please add:
>>>>>>>>
>>>>>>>> import java.util.concurrent.Semaphore;
>>>>>>>>
>>>>>>>> so you can just refer to Sempahore.
>>>>>>> Done.
>>>>>>>
>>>>>>>
>>>>>>>> Style nit: _working_threads
>>>>>>>>
>>>>>>>> Java style is to not use leading underscore and to use camelCase
>>>>>>>> for variables ie workingThreads.
>>>>>>> Right, I changed them to camelCase.
>>>>>>>
>>>>>>>
>>>>>>>> 44 static Thread _suspendresume_thread = new Thread();
>>>>>>>>
>>>>>>>> The above is dead code.
>>>>>>> Removed.
>>>>>>>
>>>>>>>
>>>>>>>> 53 if (_is_biased[me] == false) {
>>>>>>>>
>>>>>>>> Style nit: use "if (!_is_biased[me]) {"
>>>>>>> Changed.
>>>>>>>
>>>>>>>
>>>>>>>> 80 } catch(InterruptedException ie) {
>>>>>>>> 81 }
>>>>>>>>
>>>>>>>> I suggest inserting "throw new Error("Unexpected interrupt");"
>>>>>>>> for good measure.
>>>>>>>>
>>>>>>>> 111 _working_threads[i].suspend();
>>>>>>> Done.
>>>>>>>
>>>>>>>
>>>>>>>> Thread suspension has been deprecated-for-removal as of JDK 14
>>>>>>>> so could be gone in JDK 15 (but more likely 16). If the
>>>>>>>> suspend/resume is important for this test then you will need to
>>>>>>>> switch to using JVM TI suspend/resume; or else perhaps introduce
>>>>>>>> a WhiteBox method to do whatever you need in the VM.
>>>>>>> It's not really needed for this test. I just wanted to mixed
>>>>>>> suspend-resume with handshakes because I know there could be some
>>>>>>> buggy interactions between them. But since it has been deprecated
>>>>>>> then maybe there is no point in stressing handshakes with them
>>>>>>> and I can remove that part.
>>>>>>>
>>>>>>>
>>>>>>>> 123 // Wait until the desired number of direct handshakes is
>>>>>>>> reached
>>>>>>>> 124 while (_handshake_count.get() <
>>>>>>>> DIRECT_HANDSHAKES_MARK) {
>>>>>>>> 125 Thread.sleep(10); // sleep for 10ms
>>>>>>>> 126 }
>>>>>>>>
>>>>>>>> You could just do a join() on one of the worker threads.
>>>>>>> Right, didn't thought about that. Changed.
>>>>>>>
>>>>>>>
>>>>>>> Thanks for looking at this David! I'll retest with the above
>>>>>>> changes in the meantime.
>>>>>>>
>>>>>>>
>>>>>>> Patricio
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Patricio
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list