RFR 8230594: Allow direct handshakes without VMThread intervention
Patricio Chilano
patricio.chilano.mateo at oracle.com
Thu Jan 16 03:56:16 UTC 2020
Hi David,
On 1/15/20 8:29 PM, David Holmes wrote:
> Hi Patricio,
>
> On 16/01/2020 6:06 am, Patricio Chilano wrote:
>> Hi David,
>>
>> Here is v2:
>>
>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/webrev/
>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/inc/webrev/
>
> src/hotspot/share/runtime/handshake.cpp
>
> why is it that execute_direct doesn't check for
> uses_thread_local_poll() ?
Sorry, I thought for a moment that thread-local handshakes for arm32 was
already implemented and all platforms where using it. I added that check
back, and by testing that path I had to also add back the VMThread in
the assert in walk_stack_and_revoke().
> src/hotspot/share/runtime/handshake.hpp
>
> typo: wether -> whether
Fixed.
Here is v3:
Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/
Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/inc/webrev/
Thanks!
Patricio
> Otherwise updates look fine.
>
> Thanks,
> David
>
>> Passed one round of t1-6, and running more rounds now.
>>
>> Thanks,
>> Patricio
>>
>> On 1/14/20 7:57 PM, David Holmes wrote:
>>> Hi Patricio,
>>>
>>> On 15/01/2020 5:17 am, Patricio Chilano wrote:
>>>> Hi David,
>>>>
>>>> On 1/14/20 4:13 AM, David Holmes wrote:
>>>>> Hi Patricio,
>>>>>
>>>>> Thanks for tackling this. I've taken an initial look through
>>>>> trying to understand the general approach, but I don't claim to
>>>>> understand all the subtle details, so there are some queries below
>>>>> to help with my overall understanding.
>>>>>
>>>>> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> The following patch adds the ability to execute direct handshakes
>>>>>> between JavaThreads without the VMThread intervention, and
>>>>>> enables this functionality for biased locking revocations.
>>>>>> The current handshake mechanism that uses the VMThread, either to
>>>>>> handshake one JavaThread or all of them, is still the default
>>>>>> unless you specify otherwise when calling Handshake::execute().
>>>>>
>>>>> Can I suggest that rather than using an overloaded:
>>>>>
>>>>> static bool execute(HandshakeClosure* hs_cl, JavaThread* target,
>>>>> bool is_direct_handshake = false);
>>>>>
>>>>> you instead add an execute_direct method for clarity?
>>>> Ok, added execute_direct() and removed overloading.
>>>>
>>>>>> In order to avoid adding additional overhead to this path that
>>>>>> uses the VMThread (especially the one that handshakes all
>>>>>> JavaThreads) I added a new HandshakeOperation pointer in the
>>>>>> HandshakeState class, _operation_direct, to be used for the
>>>>>> direct handshake cases only and whose access is serialized
>>>>>> between JavaThreads by using a semaphore. Thus, one direct
>>>>>> handshake will be allowed at any given time, and upon completion
>>>>>> the semaphore will be signaled to allow the next handshaker if
>>>>>> any to proceed. In this way the old _operation can still be used
>>>>>> only by the VMThread without the need for synchronization to
>>>>>> access it. The handshakee will now check if any of _operation or
>>>>>> _operation_direct is set when checking for a pending handshake
>>>>>> and will try to execute both in
>>>>>> HandshakeState::process_self_inner(). The execution of the
>>>>>> handshake’s ThreadClosure, either direct handshake or not, is
>>>>>> still protected by a semaphore, which I renamed to _processing_sem.
>>>>>> I converted the semaphore _done in HandshakeOperation to be just
>>>>>> an atomic counter because of bug
>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which I
>>>>>> actually hit once!). Since the semaphore could not be static
>>>>>> anymore due to possibly having more than one HandshakeOperation
>>>>>> at a time, the handshakee could try to access the nwaiters field
>>>>>> of an already destroyed semaphore when signaling it. In any case
>>>>>> nobody was waiting on that semaphore (we were not using kernel
>>>>>> functionality), so just using an atomic counter seems more
>>>>>> appropriate.
>>>>>> In order to avoid issues due to disarming a JavaThread that
>>>>>> should still be armed for a handshake or safepoint, each
>>>>>> JavaThread will now always disarm its own polling page.
>>>>>> I also added a new test, HandshakeDirectTest.java, which tries to
>>>>>> stress the use of direct handshakes with revocations.
>>>>>> In terms of performance, I measured no difference in the
>>>>>> execution time of one individual handshake. The difference can be
>>>>>> seen when several handshakes at a time are executed as expected.
>>>>>> So for example on Linux running on an Intel Xeon 8167M cpu, test
>>>>>> HandshakeDirectTest.java (which executes 50000 handshakes between
>>>>>> 32 threads) executes in around 340ms using direct handshakes and
>>>>>> in around 5.6 seconds without it. For a modified version of that
>>>>>> test that only executes 128 handshakes between the 32 threads and
>>>>>> avoids any suspend-resume, the test takes around 12ms with direct
>>>>>> handshakes and 19ms without it.
>>>>>> Tested with mach5, tiers1-6.
>>>>>>
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>>>>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>>>>>
>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>
>>>>> 36 // A handshake closure is a callback that is executed for each
>>>>> JavaThread
>>>>> 37 // while that thread is in a safepoint safe state. The
>>>>> callback is executed
>>>>> 38 // either by the thread itself or by the VM thread while
>>>>> keeping the thread
>>>>> 39 // in a blocked state. A handshake can be performed with a
>>>>> single
>>>>> 40 // JavaThread as well.
>>>>>
>>>>> Does this comment block need updating for the direct case?
>>>>>
>>>>> 61 // the operation is only done by either VM thread on behalf of
>>>>> the JavaThread
>>>>> 62 // or the JavaThread itself.
>>>>>
>>>>> Again does this need an update?
>>>> Yes, I forgot to update those comments in handshake.hpp. Fixed.
>>>>
>>>>
>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>
>>>>> 41 class HandshakeOperation: public StackObj {
>>>>>
>>>>> Would it be clearer to have HandShakeOperation subclassed by
>>>>> DirectHandShakeOperation, rather than it being a property of the
>>>>> current op instance?
>>>> But I would still need to access the is_direct field for non-direct
>>>> handshakes in HandshakeState::try_process() anyways.
>>>
>>> Okay.
>>>
>>>>
>>>>> 41 class HandshakeOperation: public StackObj {
>>>>> 42 HandshakeClosure* _handshake_cl;
>>>>> 43 int64_t _pending_threads;
>>>>> ...
>>>>> 53 void add_target_count(int count) {
>>>>> Atomic::add(&_pending_threads, count); }
>>>>>
>>>>> Can you clarify the lifecycle of this StackObj please. You
>>>>> obviously expose it to other threads - hence the atomic update -
>>>>> but that means it needs to be guaranteed to be live when that
>>>>> update occurs.
>>>> The HandshakeOperation object is created in Handshake::execute()
>>>> (or Handshake::execute_direct() now) by the Handshaker. The only
>>>> other threads that could access this object are the VMThread(for
>>>> non-direct case only) and the Handshakee(in both cases). Seeing
>>>> that it's safe for the VMThread to decrement the counter is
>>>> straightforward since the Handshaker will be waiting on the
>>>> VMOperationRequest_lock waiting for the operation to finish (this
>>>> is just as always has been).
>>>> As for the Handshakee, we can see that the HandshakeOperation
>>>> object will be alive as long as the VMThread(for the non-direct
>>>> case) or Handshaker(for the direct case) keep seeing that the
>>>> operation is not done by calling
>>>> HandshakeOperation::is_completed(). Once that returns true, for the
>>>> non-direct case the VMThread will finish the operation and wake up
>>>> the Handshaker who will return from Handshake::execute() and the
>>>> object will be destroyed, and for the direct case the Handshaker
>>>> will just return from execute_direct() and the object will be
>>>> destroyed. Since is_completed() will only return true when
>>>> _pending_threads reaches zero, that means the decrement operation
>>>> had to be safe. Just as an observation, for the HandshakeAllThreads
>>>> operation, _pending_threads could go to zero and even negative
>>>> before executing add_target_count(), but that's okay because the
>>>> VMThread doesn't call is_completed() before that and the load of
>>>> _pending_threads cannot float above add_target_count() since the
>>>> atomic add operation provides a memory fence.
>>>> And finally the Handshakee cannot try to execute an operation that
>>>> has already being processed by the VMThread/Handshaker, because
>>>> they clear it first and then signal the _processing_sem semaphore
>>>> in HandshakeState::try_process(). That worked the same before this
>>>> change.
>>>
>>> Thanks for the detailed explanation!
>>>
>>>>
>>>>> --
>>>>>
>>>>> 349 HandshakeOperation* op = Atomic::load_acquire(&_operation);
>>>>> 350 if (op == NULL) {
>>>>> 351 op = Atomic::load_acquire(&_operation_direct);
>>>>>
>>>>> This gives preference to the non-direct op. Is it possible for the
>>>>> direct-op to be indefinitely delayed if there is a series of
>>>>> non-direct ops?
>>>> Yes, it's possible, although I think that's an unlikely scenario.
>>>> That would mean the Handshakee would be stuck in that while loop
>>>> with the VMThread constantly setting new operations in _operation.
>>>> But I can modify the loop and make it try to execute both before
>>>> going into the next interation, something like (more on
>>>> load_acquire next):
>>>>
>>>> if (has_operation()) {
>>>> HandleMark hm(_thread);
>>>> CautiouslyPreserveExceptionMark pem(_thread);
>>>> HandshakeOperation * op = _operation;
>>>> if (op != NULL) {
>>>> // Disarm before execute the operation
>>>> clear_handshake(false);
>>>> op->do_handshake(_thread);
>>>> }
>>>> op = _operation_direct;
>>>> if (op != NULL) {
>>>> // Disarm before execute the operation
>>>> clear_handshake(true);
>>>> op->do_handshake(_thread);
>>>> }
>>>> }
>>>>
>>>> what do you think?
>>>
>>> That seems better to me.
>>>
>>>>
>>>>> Also I don't see the release_stores that these load_acquire would
>>>>> pair with. I'm assuming it should be here:
>>>>>
>>>>> 319 _operation = op;
>>>>> 323 _operation_direct = op;
>>>>>
>>>>> But I would also have expected all necessary memory
>>>>> synchronization to already be present via the semaphore operations
>>>>> and/or the rest of the handshake mechanism. ??
>>>> Yes, the release is in SafepointMechanism::arm_local_poll_release()
>>>> after setting those fields. But it's true that those load_acquire
>>>> are not needed since the semaphore already has acquire semantics. I
>>>> actually wanted to do a normal load but ended up just copying how
>>>> we were loading _operation. I''ll change them and retest.
>>>
>>> Okay.
>>>
>>> Thanks,
>>> David
>>> -------
>>>
>>>>
>>>>> 396 if ((!is_direct && _operation != NULL) || (is_direct &&
>>>>> _operation_direct != NULL)){
>>>>> 406 if ((!is_direct && _operation == NULL) || (is_direct &&
>>>>> _operation_direct == NULL)){
>>>>>
>>>>> Can this not be abstracted back into a "has_operation" method that
>>>>> takes a "direct" parameter? e.g
>>>>>
>>>>> bool has_operation() const { return _operation != NULL ||
>>>>> _operation_direct != NULL; }
>>>>> + bool has_specific_operation(bool direct) {
>>>>> + return direct ? _operation_direct != NULL : _operation != NULL;
>>>>> + }
>>>> Done.
>>>>
>>>>
>>>>> ---
>>>>>
>>>>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>>>>>
>>>>> Please add:
>>>>>
>>>>> import java.util.concurrent.Semaphore;
>>>>>
>>>>> so you can just refer to Sempahore.
>>>> Done.
>>>>
>>>>
>>>>> Style nit: _working_threads
>>>>>
>>>>> Java style is to not use leading underscore and to use camelCase
>>>>> for variables ie workingThreads.
>>>> Right, I changed them to camelCase.
>>>>
>>>>
>>>>> 44 static Thread _suspendresume_thread = new Thread();
>>>>>
>>>>> The above is dead code.
>>>> Removed.
>>>>
>>>>
>>>>> 53 if (_is_biased[me] == false) {
>>>>>
>>>>> Style nit: use "if (!_is_biased[me]) {"
>>>> Changed.
>>>>
>>>>
>>>>> 80 } catch(InterruptedException ie) {
>>>>> 81 }
>>>>>
>>>>> I suggest inserting "throw new Error("Unexpected interrupt");" for
>>>>> good measure.
>>>>>
>>>>> 111 _working_threads[i].suspend();
>>>> Done.
>>>>
>>>>
>>>>> Thread suspension has been deprecated-for-removal as of JDK 14 so
>>>>> could be gone in JDK 15 (but more likely 16). If the
>>>>> suspend/resume is important for this test then you will need to
>>>>> switch to using JVM TI suspend/resume; or else perhaps introduce a
>>>>> WhiteBox method to do whatever you need in the VM.
>>>> It's not really needed for this test. I just wanted to mixed
>>>> suspend-resume with handshakes because I know there could be some
>>>> buggy interactions between them. But since it has been deprecated
>>>> then maybe there is no point in stressing handshakes with them and
>>>> I can remove that part.
>>>>
>>>>
>>>>> 123 // Wait until the desired number of direct handshakes is reached
>>>>> 124 while (_handshake_count.get() <
>>>>> DIRECT_HANDSHAKES_MARK) {
>>>>> 125 Thread.sleep(10); // sleep for 10ms
>>>>> 126 }
>>>>>
>>>>> You could just do a join() on one of the worker threads.
>>>> Right, didn't thought about that. Changed.
>>>>
>>>>
>>>> Thanks for looking at this David! I'll retest with the above
>>>> changes in the meantime.
>>>>
>>>>
>>>> Patricio
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>> Thanks!
>>>>>> Patricio
>>>>>>
>>>>
>>
More information about the hotspot-runtime-dev
mailing list