RFR 8230594: Allow direct handshakes without VMThread intervention
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Thu Jan 16 22:14:58 UTC 2020
Hi Patricio, Thank you for walking me through this code. I have a few
comments on the 03 version. Many are requests for comments and
descriptions.
http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/src/hotspot/share/runtime/handshake.cpp.frames.html
48 HandshakeOperation(HandshakeClosure* cl, bool is_direct) :
_handshake_cl(cl), _pending_threads(1), _executed(false),
_is_direct(is_direct) {}
You could remove the first constructor and have is_direct be a default
parameter = false.
52 return _pending_threads == 0;
Does pending_threads need a load_acquire?
316 HandshakeState::HandshakeState() : _operation(NULL),
_operation_direct(NULL), _handshake_turn_sem(1), _processing_sem(1),
_thread_in_process_handshake(false) {
Can you make this line shorter?
320 void HandshakeState::set_operation(HandshakeOperation* op) {
I get confused by which functions the handshaker calls vs. the
handshakee. I think this one is the handshaker. Can you add an
assert(is_VM_Thread or even (I think) assert(this !=
JavaThread::current()) in this function?
331 void HandshakeState::clear_handshake(bool is_direct) {
Is this function only called by the handshakee? Can you assert this ==
JavaThread::current() here if so?
or actually: assert (_thread == JavaThread::current()?
In the HandshakeState, _thread is the handshakee right? Maybe _thread
could be changed to _handshake_target_thread if so? That might make
many of the functions less confusing with regards to who calls them.
285 JavaThread *self = (JavaThread*)Thread::current();
Should be JavaThread::current() and not have the cast.
http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/src/hotspot/share/runtime/handshake.hpp.frames.html
62 // The HandshakeState keep tracks of an ongoing handshake for one JavaThread.
63 // VMThread/Handshaker and JavaThread are serialized with the
semaphore making
64 // sure the operation is only done by either VMThread/Handshaker on
behalf of
65 // the JavaThread or the JavaThread itself.
6
The comment only refers to one semaphore but there are two. Also on
line 62, can you substitute "this" for "one", so it says
// The HandshakeState keeps track of an ongoing handshake for this
JavaThread.
+ JavaThread *_thread; // or even _handshakee might be a better name where this appears in the code.
The star is is in the wrong place.
71 Semaphore _handshake_turn_sem;
72 Semaphore _processing_sem;
Can you add a sentence before each of these semaphores describing what
they do? The naming makes them seem similar. A description would help
disambiguate them.
For the test, I thought suspend/resume isn't going to be going away for
a long time and I thought it was useful as a stresser for your code,
especially to see the nontrivial interactions. I'm not sure that I
agree with removing it.
BTW. looks really good apart from these requests to make it easier to
understand. It's complicated.
Thanks!
Coleen
On 1/15/20 10:56 PM, Patricio Chilano wrote:
> Hi David,
>
> On 1/15/20 8:29 PM, David Holmes wrote:
>> Hi Patricio,
>>
>> On 16/01/2020 6:06 am, Patricio Chilano wrote:
>>> Hi David,
>>>
>>> Here is v2:
>>>
>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/webrev/
>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/inc/webrev/
>>
>> src/hotspot/share/runtime/handshake.cpp
>>
>> why is it that execute_direct doesn't check for
>> uses_thread_local_poll() ?
> Sorry, I thought for a moment that thread-local handshakes for arm32
> was already implemented and all platforms where using it. I added that
> check back, and by testing that path I had to also add back the
> VMThread in the assert in walk_stack_and_revoke().
>
>> src/hotspot/share/runtime/handshake.hpp
>>
>> typo: wether -> whether
> Fixed.
>
> Here is v3:
> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/
> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/inc/webrev/
>
> Thanks!
>
> Patricio
>> Otherwise updates look fine.
>>
>> Thanks,
>> David
>>
>>> Passed one round of t1-6, and running more rounds now.
>>>
>>> Thanks,
>>> Patricio
>>>
>>> On 1/14/20 7:57 PM, David Holmes wrote:
>>>> Hi Patricio,
>>>>
>>>> On 15/01/2020 5:17 am, Patricio Chilano wrote:
>>>>> Hi David,
>>>>>
>>>>> On 1/14/20 4:13 AM, David Holmes wrote:
>>>>>> Hi Patricio,
>>>>>>
>>>>>> Thanks for tackling this. I've taken an initial look through
>>>>>> trying to understand the general approach, but I don't claim to
>>>>>> understand all the subtle details, so there are some queries
>>>>>> below to help with my overall understanding.
>>>>>>
>>>>>> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> The following patch adds the ability to execute direct
>>>>>>> handshakes between JavaThreads without the VMThread
>>>>>>> intervention, and enables this functionality for biased locking
>>>>>>> revocations.
>>>>>>> The current handshake mechanism that uses the VMThread, either
>>>>>>> to handshake one JavaThread or all of them, is still the default
>>>>>>> unless you specify otherwise when calling Handshake::execute().
>>>>>>
>>>>>> Can I suggest that rather than using an overloaded:
>>>>>>
>>>>>> static bool execute(HandshakeClosure* hs_cl, JavaThread* target,
>>>>>> bool is_direct_handshake = false);
>>>>>>
>>>>>> you instead add an execute_direct method for clarity?
>>>>> Ok, added execute_direct() and removed overloading.
>>>>>
>>>>>>> In order to avoid adding additional overhead to this path that
>>>>>>> uses the VMThread (especially the one that handshakes all
>>>>>>> JavaThreads) I added a new HandshakeOperation pointer in the
>>>>>>> HandshakeState class, _operation_direct, to be used for the
>>>>>>> direct handshake cases only and whose access is serialized
>>>>>>> between JavaThreads by using a semaphore. Thus, one direct
>>>>>>> handshake will be allowed at any given time, and upon completion
>>>>>>> the semaphore will be signaled to allow the next handshaker if
>>>>>>> any to proceed. In this way the old _operation can still be used
>>>>>>> only by the VMThread without the need for synchronization to
>>>>>>> access it. The handshakee will now check if any of _operation or
>>>>>>> _operation_direct is set when checking for a pending handshake
>>>>>>> and will try to execute both in
>>>>>>> HandshakeState::process_self_inner(). The execution of the
>>>>>>> handshake’s ThreadClosure, either direct handshake or not, is
>>>>>>> still protected by a semaphore, which I renamed to _processing_sem.
>>>>>>> I converted the semaphore _done in HandshakeOperation to be just
>>>>>>> an atomic counter because of bug
>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which I
>>>>>>> actually hit once!). Since the semaphore could not be static
>>>>>>> anymore due to possibly having more than one HandshakeOperation
>>>>>>> at a time, the handshakee could try to access the nwaiters field
>>>>>>> of an already destroyed semaphore when signaling it. In any case
>>>>>>> nobody was waiting on that semaphore (we were not using kernel
>>>>>>> functionality), so just using an atomic counter seems more
>>>>>>> appropriate.
>>>>>>> In order to avoid issues due to disarming a JavaThread that
>>>>>>> should still be armed for a handshake or safepoint, each
>>>>>>> JavaThread will now always disarm its own polling page.
>>>>>>> I also added a new test, HandshakeDirectTest.java, which tries
>>>>>>> to stress the use of direct handshakes with revocations.
>>>>>>> In terms of performance, I measured no difference in the
>>>>>>> execution time of one individual handshake. The difference can
>>>>>>> be seen when several handshakes at a time are executed as
>>>>>>> expected. So for example on Linux running on an Intel Xeon 8167M
>>>>>>> cpu, test HandshakeDirectTest.java (which executes 50000
>>>>>>> handshakes between 32 threads) executes in around 340ms using
>>>>>>> direct handshakes and in around 5.6 seconds without it. For a
>>>>>>> modified version of that test that only executes 128 handshakes
>>>>>>> between the 32 threads and avoids any suspend-resume, the test
>>>>>>> takes around 12ms with direct handshakes and 19ms without it.
>>>>>>> Tested with mach5, tiers1-6.
>>>>>>>
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>>>>>>> Webrev:
>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>>>>>>
>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>
>>>>>> 36 // A handshake closure is a callback that is executed for
>>>>>> each JavaThread
>>>>>> 37 // while that thread is in a safepoint safe state. The
>>>>>> callback is executed
>>>>>> 38 // either by the thread itself or by the VM thread while
>>>>>> keeping the thread
>>>>>> 39 // in a blocked state. A handshake can be performed with a
>>>>>> single
>>>>>> 40 // JavaThread as well.
>>>>>>
>>>>>> Does this comment block need updating for the direct case?
>>>>>>
>>>>>> 61 // the operation is only done by either VM thread on behalf
>>>>>> of the JavaThread
>>>>>> 62 // or the JavaThread itself.
>>>>>>
>>>>>> Again does this need an update?
>>>>> Yes, I forgot to update those comments in handshake.hpp. Fixed.
>>>>>
>>>>>
>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>
>>>>>> 41 class HandshakeOperation: public StackObj {
>>>>>>
>>>>>> Would it be clearer to have HandShakeOperation subclassed by
>>>>>> DirectHandShakeOperation, rather than it being a property of the
>>>>>> current op instance?
>>>>> But I would still need to access the is_direct field for
>>>>> non-direct handshakes in HandshakeState::try_process() anyways.
>>>>
>>>> Okay.
>>>>
>>>>>
>>>>>> 41 class HandshakeOperation: public StackObj {
>>>>>> 42 HandshakeClosure* _handshake_cl;
>>>>>> 43 int64_t _pending_threads;
>>>>>> ...
>>>>>> 53 void add_target_count(int count) {
>>>>>> Atomic::add(&_pending_threads, count); }
>>>>>>
>>>>>> Can you clarify the lifecycle of this StackObj please. You
>>>>>> obviously expose it to other threads - hence the atomic update -
>>>>>> but that means it needs to be guaranteed to be live when that
>>>>>> update occurs.
>>>>> The HandshakeOperation object is created in Handshake::execute()
>>>>> (or Handshake::execute_direct() now) by the Handshaker. The only
>>>>> other threads that could access this object are the VMThread(for
>>>>> non-direct case only) and the Handshakee(in both cases). Seeing
>>>>> that it's safe for the VMThread to decrement the counter is
>>>>> straightforward since the Handshaker will be waiting on the
>>>>> VMOperationRequest_lock waiting for the operation to finish (this
>>>>> is just as always has been).
>>>>> As for the Handshakee, we can see that the HandshakeOperation
>>>>> object will be alive as long as the VMThread(for the non-direct
>>>>> case) or Handshaker(for the direct case) keep seeing that the
>>>>> operation is not done by calling
>>>>> HandshakeOperation::is_completed(). Once that returns true, for
>>>>> the non-direct case the VMThread will finish the operation and
>>>>> wake up the Handshaker who will return from Handshake::execute()
>>>>> and the object will be destroyed, and for the direct case the
>>>>> Handshaker will just return from execute_direct() and the object
>>>>> will be destroyed. Since is_completed() will only return true when
>>>>> _pending_threads reaches zero, that means the decrement operation
>>>>> had to be safe. Just as an observation, for the
>>>>> HandshakeAllThreads operation, _pending_threads could go to zero
>>>>> and even negative before executing add_target_count(), but that's
>>>>> okay because the VMThread doesn't call is_completed() before that
>>>>> and the load of _pending_threads cannot float above
>>>>> add_target_count() since the atomic add operation provides a
>>>>> memory fence.
>>>>> And finally the Handshakee cannot try to execute an operation that
>>>>> has already being processed by the VMThread/Handshaker, because
>>>>> they clear it first and then signal the _processing_sem semaphore
>>>>> in HandshakeState::try_process(). That worked the same before this
>>>>> change.
>>>>
>>>> Thanks for the detailed explanation!
>>>>
>>>>>
>>>>>> --
>>>>>>
>>>>>> 349 HandshakeOperation* op = Atomic::load_acquire(&_operation);
>>>>>> 350 if (op == NULL) {
>>>>>> 351 op = Atomic::load_acquire(&_operation_direct);
>>>>>>
>>>>>> This gives preference to the non-direct op. Is it possible for
>>>>>> the direct-op to be indefinitely delayed if there is a series of
>>>>>> non-direct ops?
>>>>> Yes, it's possible, although I think that's an unlikely scenario.
>>>>> That would mean the Handshakee would be stuck in that while loop
>>>>> with the VMThread constantly setting new operations in _operation.
>>>>> But I can modify the loop and make it try to execute both before
>>>>> going into the next interation, something like (more on
>>>>> load_acquire next):
>>>>>
>>>>> if (has_operation()) {
>>>>> HandleMark hm(_thread);
>>>>> CautiouslyPreserveExceptionMark pem(_thread);
>>>>> HandshakeOperation * op = _operation;
>>>>> if (op != NULL) {
>>>>> // Disarm before execute the operation
>>>>> clear_handshake(false);
>>>>> op->do_handshake(_thread);
>>>>> }
>>>>> op = _operation_direct;
>>>>> if (op != NULL) {
>>>>> // Disarm before execute the operation
>>>>> clear_handshake(true);
>>>>> op->do_handshake(_thread);
>>>>> }
>>>>> }
>>>>>
>>>>> what do you think?
>>>>
>>>> That seems better to me.
>>>>
>>>>>
>>>>>> Also I don't see the release_stores that these load_acquire would
>>>>>> pair with. I'm assuming it should be here:
>>>>>>
>>>>>> 319 _operation = op;
>>>>>> 323 _operation_direct = op;
>>>>>>
>>>>>> But I would also have expected all necessary memory
>>>>>> synchronization to already be present via the semaphore
>>>>>> operations and/or the rest of the handshake mechanism. ??
>>>>> Yes, the release is in
>>>>> SafepointMechanism::arm_local_poll_release() after setting those
>>>>> fields. But it's true that those load_acquire are not needed since
>>>>> the semaphore already has acquire semantics. I actually wanted to
>>>>> do a normal load but ended up just copying how we were loading
>>>>> _operation. I''ll change them and retest.
>>>>
>>>> Okay.
>>>>
>>>> Thanks,
>>>> David
>>>> -------
>>>>
>>>>>
>>>>>> 396 if ((!is_direct && _operation != NULL) || (is_direct &&
>>>>>> _operation_direct != NULL)){
>>>>>> 406 if ((!is_direct && _operation == NULL) || (is_direct &&
>>>>>> _operation_direct == NULL)){
>>>>>>
>>>>>> Can this not be abstracted back into a "has_operation" method
>>>>>> that takes a "direct" parameter? e.g
>>>>>>
>>>>>> bool has_operation() const { return _operation != NULL ||
>>>>>> _operation_direct != NULL; }
>>>>>> + bool has_specific_operation(bool direct) {
>>>>>> + return direct ? _operation_direct != NULL : _operation != NULL;
>>>>>> + }
>>>>> Done.
>>>>>
>>>>>
>>>>>> ---
>>>>>>
>>>>>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>>>>>>
>>>>>> Please add:
>>>>>>
>>>>>> import java.util.concurrent.Semaphore;
>>>>>>
>>>>>> so you can just refer to Sempahore.
>>>>> Done.
>>>>>
>>>>>
>>>>>> Style nit: _working_threads
>>>>>>
>>>>>> Java style is to not use leading underscore and to use camelCase
>>>>>> for variables ie workingThreads.
>>>>> Right, I changed them to camelCase.
>>>>>
>>>>>
>>>>>> 44 static Thread _suspendresume_thread = new Thread();
>>>>>>
>>>>>> The above is dead code.
>>>>> Removed.
>>>>>
>>>>>
>>>>>> 53 if (_is_biased[me] == false) {
>>>>>>
>>>>>> Style nit: use "if (!_is_biased[me]) {"
>>>>> Changed.
>>>>>
>>>>>
>>>>>> 80 } catch(InterruptedException ie) {
>>>>>> 81 }
>>>>>>
>>>>>> I suggest inserting "throw new Error("Unexpected interrupt");"
>>>>>> for good measure.
>>>>>>
>>>>>> 111 _working_threads[i].suspend();
>>>>> Done.
>>>>>
>>>>>
>>>>>> Thread suspension has been deprecated-for-removal as of JDK 14 so
>>>>>> could be gone in JDK 15 (but more likely 16). If the
>>>>>> suspend/resume is important for this test then you will need to
>>>>>> switch to using JVM TI suspend/resume; or else perhaps introduce
>>>>>> a WhiteBox method to do whatever you need in the VM.
>>>>> It's not really needed for this test. I just wanted to mixed
>>>>> suspend-resume with handshakes because I know there could be some
>>>>> buggy interactions between them. But since it has been deprecated
>>>>> then maybe there is no point in stressing handshakes with them and
>>>>> I can remove that part.
>>>>>
>>>>>
>>>>>> 123 // Wait until the desired number of direct handshakes is
>>>>>> reached
>>>>>> 124 while (_handshake_count.get() <
>>>>>> DIRECT_HANDSHAKES_MARK) {
>>>>>> 125 Thread.sleep(10); // sleep for 10ms
>>>>>> 126 }
>>>>>>
>>>>>> You could just do a join() on one of the worker threads.
>>>>> Right, didn't thought about that. Changed.
>>>>>
>>>>>
>>>>> Thanks for looking at this David! I'll retest with the above
>>>>> changes in the meantime.
>>>>>
>>>>>
>>>>> Patricio
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Thanks!
>>>>>>> Patricio
>>>>>>>
>>>>>
>>>
>
More information about the hotspot-runtime-dev
mailing list