RFR 8230594: Allow direct handshakes without VMThread intervention
Robbin Ehn
robbin.ehn at oracle.com
Wed Jan 22 08:58:21 UTC 2020
> Here is v5, which contains also style changes mentioned by Dan in his last review:
>
> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v05/webrev/
> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v05/inc/webrev/
Ship it, thanks!
/Robbin
>
> Thanks!
>
> Patricio
>> Thanks for fixing, Robbin
>>
>>>
>>> Thanks!
>>>
>>> Patricio
>>>> Dan
>>>>
>>>>
>>>> On 1/15/20 10:56 PM, Patricio Chilano wrote:
>>>>> Hi David,
>>>>>
>>>>> On 1/15/20 8:29 PM, David Holmes wrote:
>>>>>> Hi Patricio,
>>>>>>
>>>>>> On 16/01/2020 6:06 am, Patricio Chilano wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Here is v2:
>>>>>>>
>>>>>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/webrev/
>>>>>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/inc/webrev/
>>>>>>
>>>>>> src/hotspot/share/runtime/handshake.cpp
>>>>>>
>>>>>> why is it that execute_direct doesn't check for uses_thread_local_poll() ?
>>>>> Sorry, I thought for a moment that thread-local handshakes for arm32 was
>>>>> already implemented and all platforms where using it. I added that check
>>>>> back, and by testing that path I had to also add back the VMThread in the
>>>>> assert in walk_stack_and_revoke().
>>>>>
>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>
>>>>>> typo: wether -> whether
>>>>> Fixed.
>>>>>
>>>>> Here is v3:
>>>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/
>>>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/inc/webrev/
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Patricio
>>>>>> Otherwise updates look fine.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Passed one round of t1-6, and running more rounds now.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Patricio
>>>>>>>
>>>>>>> On 1/14/20 7:57 PM, David Holmes wrote:
>>>>>>>> Hi Patricio,
>>>>>>>>
>>>>>>>> On 15/01/2020 5:17 am, Patricio Chilano wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> On 1/14/20 4:13 AM, David Holmes wrote:
>>>>>>>>>> Hi Patricio,
>>>>>>>>>>
>>>>>>>>>> Thanks for tackling this. I've taken an initial look through trying to
>>>>>>>>>> understand the general approach, but I don't claim to understand all
>>>>>>>>>> the subtle details, so there are some queries below to help with my
>>>>>>>>>> overall understanding.
>>>>>>>>>>
>>>>>>>>>> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> The following patch adds the ability to execute direct handshakes
>>>>>>>>>>> between JavaThreads without the VMThread intervention, and enables
>>>>>>>>>>> this functionality for biased locking revocations.
>>>>>>>>>>> The current handshake mechanism that uses the VMThread, either to
>>>>>>>>>>> handshake one JavaThread or all of them, is still the default unless
>>>>>>>>>>> you specify otherwise when calling Handshake::execute().
>>>>>>>>>>
>>>>>>>>>> Can I suggest that rather than using an overloaded:
>>>>>>>>>>
>>>>>>>>>> static bool execute(HandshakeClosure* hs_cl, JavaThread* target, bool
>>>>>>>>>> is_direct_handshake = false);
>>>>>>>>>>
>>>>>>>>>> you instead add an execute_direct method for clarity?
>>>>>>>>> Ok, added execute_direct() and removed overloading.
>>>>>>>>>
>>>>>>>>>>> In order to avoid adding additional overhead to this path that uses
>>>>>>>>>>> the VMThread (especially the one that handshakes all JavaThreads) I
>>>>>>>>>>> added a new HandshakeOperation pointer in the HandshakeState class,
>>>>>>>>>>> _operation_direct, to be used for the direct handshake cases only and
>>>>>>>>>>> whose access is serialized between JavaThreads by using a semaphore.
>>>>>>>>>>> Thus, one direct handshake will be allowed at any given time, and
>>>>>>>>>>> upon completion the semaphore will be signaled to allow the next
>>>>>>>>>>> handshaker if any to proceed. In this way the old _operation can
>>>>>>>>>>> still be used only by the VMThread without the need for
>>>>>>>>>>> synchronization to access it. The handshakee will now check if any of
>>>>>>>>>>> _operation or _operation_direct is set when checking for a pending
>>>>>>>>>>> handshake and will try to execute both in
>>>>>>>>>>> HandshakeState::process_self_inner(). The execution of the
>>>>>>>>>>> handshake’s ThreadClosure, either direct handshake or not, is still
>>>>>>>>>>> protected by a semaphore, which I renamed to _processing_sem.
>>>>>>>>>>> I converted the semaphore _done in HandshakeOperation to be just an
>>>>>>>>>>> atomic counter because of bug
>>>>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which I
>>>>>>>>>>> actually hit once!). Since the semaphore could not be static anymore
>>>>>>>>>>> due to possibly having more than one HandshakeOperation at a time,
>>>>>>>>>>> the handshakee could try to access the nwaiters field of an already
>>>>>>>>>>> destroyed semaphore when signaling it. In any case nobody was waiting
>>>>>>>>>>> on that semaphore (we were not using kernel functionality), so just
>>>>>>>>>>> using an atomic counter seems more appropriate.
>>>>>>>>>>> In order to avoid issues due to disarming a JavaThread that should
>>>>>>>>>>> still be armed for a handshake or safepoint, each JavaThread will now
>>>>>>>>>>> always disarm its own polling page.
>>>>>>>>>>> I also added a new test, HandshakeDirectTest.java, which tries to
>>>>>>>>>>> stress the use of direct handshakes with revocations.
>>>>>>>>>>> In terms of performance, I measured no difference in the execution
>>>>>>>>>>> time of one individual handshake. The difference can be seen when
>>>>>>>>>>> several handshakes at a time are executed as expected. So for example
>>>>>>>>>>> on Linux running on an Intel Xeon 8167M cpu, test
>>>>>>>>>>> HandshakeDirectTest.java (which executes 50000 handshakes between 32
>>>>>>>>>>> threads) executes in around 340ms using direct handshakes and in
>>>>>>>>>>> around 5.6 seconds without it. For a modified version of that test
>>>>>>>>>>> that only executes 128 handshakes between the 32 threads and avoids
>>>>>>>>>>> any suspend-resume, the test takes around 12ms with direct handshakes
>>>>>>>>>>> and 19ms without it.
>>>>>>>>>>> Tested with mach5, tiers1-6.
>>>>>>>>>>>
>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>>>>>>>>>>
>>>>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>>>>
>>>>>>>>>> 36 // A handshake closure is a callback that is executed for each
>>>>>>>>>> JavaThread
>>>>>>>>>> 37 // while that thread is in a safepoint safe state. The callback
>>>>>>>>>> is executed
>>>>>>>>>> 38 // either by the thread itself or by the VM thread while keeping
>>>>>>>>>> the thread
>>>>>>>>>> 39 // in a blocked state. A handshake can be performed with a single
>>>>>>>>>> 40 // JavaThread as well.
>>>>>>>>>>
>>>>>>>>>> Does this comment block need updating for the direct case?
>>>>>>>>>>
>>>>>>>>>> 61 // the operation is only done by either VM thread on behalf of the
>>>>>>>>>> JavaThread
>>>>>>>>>> 62 // or the JavaThread itself.
>>>>>>>>>>
>>>>>>>>>> Again does this need an update?
>>>>>>>>> Yes, I forgot to update those comments in handshake.hpp. Fixed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>>>>
>>>>>>>>>> 41 class HandshakeOperation: public StackObj {
>>>>>>>>>>
>>>>>>>>>> Would it be clearer to have HandShakeOperation subclassed by
>>>>>>>>>> DirectHandShakeOperation, rather than it being a property of the
>>>>>>>>>> current op instance?
>>>>>>>>> But I would still need to access the is_direct field for non-direct
>>>>>>>>> handshakes in HandshakeState::try_process() anyways.
>>>>>>>>
>>>>>>>> Okay.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 41 class HandshakeOperation: public StackObj {
>>>>>>>>>> 42 HandshakeClosure* _handshake_cl;
>>>>>>>>>> 43 int64_t _pending_threads;
>>>>>>>>>> ...
>>>>>>>>>> 53 void add_target_count(int count) {
>>>>>>>>>> Atomic::add(&_pending_threads, count); }
>>>>>>>>>>
>>>>>>>>>> Can you clarify the lifecycle of this StackObj please. You obviously
>>>>>>>>>> expose it to other threads - hence the atomic update - but that means
>>>>>>>>>> it needs to be guaranteed to be live when that update occurs.
>>>>>>>>> The HandshakeOperation object is created in Handshake::execute() (or
>>>>>>>>> Handshake::execute_direct() now) by the Handshaker. The only other
>>>>>>>>> threads that could access this object are the VMThread(for non-direct
>>>>>>>>> case only) and the Handshakee(in both cases). Seeing that it's safe for
>>>>>>>>> the VMThread to decrement the counter is straightforward since the
>>>>>>>>> Handshaker will be waiting on the VMOperationRequest_lock waiting for
>>>>>>>>> the operation to finish (this is just as always has been).
>>>>>>>>> As for the Handshakee, we can see that the HandshakeOperation object
>>>>>>>>> will be alive as long as the VMThread(for the non-direct case) or
>>>>>>>>> Handshaker(for the direct case) keep seeing that the operation is not
>>>>>>>>> done by calling HandshakeOperation::is_completed(). Once that returns
>>>>>>>>> true, for the non-direct case the VMThread will finish the operation
>>>>>>>>> and wake up the Handshaker who will return from Handshake::execute()
>>>>>>>>> and the object will be destroyed, and for the direct case the
>>>>>>>>> Handshaker will just return from execute_direct() and the object will
>>>>>>>>> be destroyed. Since is_completed() will only return true when
>>>>>>>>> _pending_threads reaches zero, that means the decrement operation had
>>>>>>>>> to be safe. Just as an observation, for the HandshakeAllThreads
>>>>>>>>> operation, _pending_threads could go to zero and even negative before
>>>>>>>>> executing add_target_count(), but that's okay because the VMThread
>>>>>>>>> doesn't call is_completed() before that and the load of
>>>>>>>>> _pending_threads cannot float above add_target_count() since the atomic
>>>>>>>>> add operation provides a memory fence.
>>>>>>>>> And finally the Handshakee cannot try to execute an operation that has
>>>>>>>>> already being processed by the VMThread/Handshaker, because they clear
>>>>>>>>> it first and then signal the _processing_sem semaphore in
>>>>>>>>> HandshakeState::try_process(). That worked the same before this change.
>>>>>>>>
>>>>>>>> Thanks for the detailed explanation!
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> 349 HandshakeOperation* op = Atomic::load_acquire(&_operation);
>>>>>>>>>> 350 if (op == NULL) {
>>>>>>>>>> 351 op = Atomic::load_acquire(&_operation_direct);
>>>>>>>>>>
>>>>>>>>>> This gives preference to the non-direct op. Is it possible for the
>>>>>>>>>> direct-op to be indefinitely delayed if there is a series of
>>>>>>>>>> non-direct ops?
>>>>>>>>> Yes, it's possible, although I think that's an unlikely scenario. That
>>>>>>>>> would mean the Handshakee would be stuck in that while loop with the
>>>>>>>>> VMThread constantly setting new operations in _operation. But I can
>>>>>>>>> modify the loop and make it try to execute both before going into the
>>>>>>>>> next interation, something like (more on load_acquire next):
>>>>>>>>>
>>>>>>>>> if (has_operation()) {
>>>>>>>>> HandleMark hm(_thread);
>>>>>>>>> CautiouslyPreserveExceptionMark pem(_thread);
>>>>>>>>> HandshakeOperation * op = _operation;
>>>>>>>>> if (op != NULL) {
>>>>>>>>> // Disarm before execute the operation
>>>>>>>>> clear_handshake(false);
>>>>>>>>> op->do_handshake(_thread);
>>>>>>>>> }
>>>>>>>>> op = _operation_direct;
>>>>>>>>> if (op != NULL) {
>>>>>>>>> // Disarm before execute the operation
>>>>>>>>> clear_handshake(true);
>>>>>>>>> op->do_handshake(_thread);
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> what do you think?
>>>>>>>>
>>>>>>>> That seems better to me.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Also I don't see the release_stores that these load_acquire would pair
>>>>>>>>>> with. I'm assuming it should be here:
>>>>>>>>>>
>>>>>>>>>> 319 _operation = op;
>>>>>>>>>> 323 _operation_direct = op;
>>>>>>>>>>
>>>>>>>>>> But I would also have expected all necessary memory synchronization to
>>>>>>>>>> already be present via the semaphore operations and/or the rest of the
>>>>>>>>>> handshake mechanism. ??
>>>>>>>>> Yes, the release is in SafepointMechanism::arm_local_poll_release()
>>>>>>>>> after setting those fields. But it's true that those load_acquire are
>>>>>>>>> not needed since the semaphore already has acquire semantics. I
>>>>>>>>> actually wanted to do a normal load but ended up just copying how we
>>>>>>>>> were loading _operation. I''ll change them and retest.
>>>>>>>>
>>>>>>>> Okay.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -------
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 396 if ((!is_direct && _operation != NULL) || (is_direct &&
>>>>>>>>>> _operation_direct != NULL)){
>>>>>>>>>> 406 if ((!is_direct && _operation == NULL) || (is_direct &&
>>>>>>>>>> _operation_direct == NULL)){
>>>>>>>>>>
>>>>>>>>>> Can this not be abstracted back into a "has_operation" method that
>>>>>>>>>> takes a "direct" parameter? e.g
>>>>>>>>>>
>>>>>>>>>> bool has_operation() const { return _operation != NULL ||
>>>>>>>>>> _operation_direct != NULL; }
>>>>>>>>>> + bool has_specific_operation(bool direct) {
>>>>>>>>>> + return direct ? _operation_direct != NULL : _operation != NULL;
>>>>>>>>>> + }
>>>>>>>>> Done.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>>>>>>>>>>
>>>>>>>>>> Please add:
>>>>>>>>>>
>>>>>>>>>> import java.util.concurrent.Semaphore;
>>>>>>>>>>
>>>>>>>>>> so you can just refer to Sempahore.
>>>>>>>>> Done.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Style nit: _working_threads
>>>>>>>>>>
>>>>>>>>>> Java style is to not use leading underscore and to use camelCase for
>>>>>>>>>> variables ie workingThreads.
>>>>>>>>> Right, I changed them to camelCase.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 44 static Thread _suspendresume_thread = new Thread();
>>>>>>>>>>
>>>>>>>>>> The above is dead code.
>>>>>>>>> Removed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 53 if (_is_biased[me] == false) {
>>>>>>>>>>
>>>>>>>>>> Style nit: use "if (!_is_biased[me]) {"
>>>>>>>>> Changed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 80 } catch(InterruptedException ie) {
>>>>>>>>>> 81 }
>>>>>>>>>>
>>>>>>>>>> I suggest inserting "throw new Error("Unexpected interrupt");" for
>>>>>>>>>> good measure.
>>>>>>>>>>
>>>>>>>>>> 111 _working_threads[i].suspend();
>>>>>>>>> Done.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thread suspension has been deprecated-for-removal as of JDK 14 so
>>>>>>>>>> could be gone in JDK 15 (but more likely 16). If the suspend/resume is
>>>>>>>>>> important for this test then you will need to switch to using JVM TI
>>>>>>>>>> suspend/resume; or else perhaps introduce a WhiteBox method to do
>>>>>>>>>> whatever you need in the VM.
>>>>>>>>> It's not really needed for this test. I just wanted to mixed
>>>>>>>>> suspend-resume with handshakes because I know there could be some buggy
>>>>>>>>> interactions between them. But since it has been deprecated then maybe
>>>>>>>>> there is no point in stressing handshakes with them and I can remove
>>>>>>>>> that part.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 123 // Wait until the desired number of direct handshakes is reached
>>>>>>>>>> 124 while (_handshake_count.get() < DIRECT_HANDSHAKES_MARK) {
>>>>>>>>>> 125 Thread.sleep(10); // sleep for 10ms
>>>>>>>>>> 126 }
>>>>>>>>>>
>>>>>>>>>> You could just do a join() on one of the worker threads.
>>>>>>>>> Right, didn't thought about that. Changed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for looking at this David! I'll retest with the above changes in
>>>>>>>>> the meantime.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Patricio
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> Patricio
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>
More information about the hotspot-runtime-dev
mailing list