RFR 8230594: Allow direct handshakes without VMThread intervention

Wed Jan 15 23:29:58 UTC 2020

Hi Patricio,

On 16/01/2020 6:06 am, Patricio Chilano wrote:
> Hi David,
> 
> Here is v2:
> 
> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/webrev/
> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/inc/webrev/

src/hotspot/share/runtime/handshake.cpp

why is it that execute_direct doesn't check for uses_thread_local_poll() ?

src/hotspot/share/runtime/handshake.hpp

typo: wether -> whether

Otherwise updates look fine.

Thanks,
David

> Passed one round of t1-6, and running more rounds now.
> 
> Thanks,
> Patricio
> 
> On 1/14/20 7:57 PM, David Holmes wrote:
>> Hi Patricio,
>>
>> On 15/01/2020 5:17 am, Patricio Chilano wrote:
>>> Hi David,
>>>
>>> On 1/14/20 4:13 AM, David Holmes wrote:
>>>> Hi Patricio,
>>>>
>>>> Thanks for tackling this. I've taken an initial look through trying 
>>>> to understand the general approach, but I don't claim to understand 
>>>> all the subtle details, so there are some queries below to help with 
>>>> my overall understanding.
>>>>
>>>> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>>>>> Hi all,
>>>>>
>>>>> The following patch adds the ability to execute direct handshakes 
>>>>> between JavaThreads without the VMThread intervention, and enables 
>>>>> this functionality for biased locking revocations.
>>>>> The current handshake mechanism that uses the VMThread, either to 
>>>>> handshake one JavaThread or all of them, is still the default 
>>>>> unless you specify otherwise when calling Handshake::execute().
>>>>
>>>> Can I suggest that rather than using an overloaded:
>>>>
>>>>  static bool execute(HandshakeClosure* hs_cl, JavaThread* target, 
>>>> bool is_direct_handshake = false);
>>>>
>>>> you instead add an execute_direct method for clarity?
>>> Ok, added execute_direct() and removed overloading.
>>>
>>>>> In order to avoid adding additional overhead to this path that uses 
>>>>> the VMThread (especially the one that handshakes all JavaThreads) I 
>>>>> added a new HandshakeOperation pointer in the HandshakeState class, 
>>>>> _operation_direct, to be used for the direct handshake cases only 
>>>>> and whose access is serialized between JavaThreads by using a 
>>>>> semaphore. Thus, one direct handshake will be allowed at any given 
>>>>> time, and upon completion the semaphore will be signaled to allow 
>>>>> the next handshaker if any to proceed. In this way the old 
>>>>> _operation can still be used only by the VMThread without the need 
>>>>> for synchronization to access it. The handshakee will now check if 
>>>>> any of _operation or _operation_direct is set when checking for a 
>>>>> pending handshake and will try to execute both in 
>>>>> HandshakeState::process_self_inner(). The execution of the 
>>>>> handshake’s ThreadClosure, either direct handshake or not, is still 
>>>>> protected by a semaphore, which I renamed to _processing_sem.
>>>>> I converted the semaphore _done in HandshakeOperation to be just an 
>>>>> atomic counter because of bug 
>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which I 
>>>>> actually hit once!). Since the semaphore could not be static 
>>>>> anymore due to possibly having more than one HandshakeOperation at 
>>>>> a time, the handshakee could try to access the nwaiters field of an 
>>>>> already destroyed semaphore when signaling it. In any case nobody 
>>>>> was waiting on that semaphore (we were not using kernel 
>>>>> functionality), so just using an atomic counter seems more 
>>>>> appropriate.
>>>>> In order to avoid issues due to disarming a JavaThread that should 
>>>>> still be armed for a handshake or safepoint, each JavaThread will 
>>>>> now always disarm its own polling page.
>>>>> I also added a new test, HandshakeDirectTest.java, which tries to 
>>>>> stress the use of direct handshakes with revocations.
>>>>> In terms of performance, I measured no difference in the execution 
>>>>> time of one individual handshake. The difference can be seen when 
>>>>> several handshakes at a time are executed as expected. So for 
>>>>> example on Linux running on an Intel Xeon 8167M cpu, test 
>>>>> HandshakeDirectTest.java (which executes 50000 handshakes between 
>>>>> 32 threads) executes in around 340ms using direct handshakes and in 
>>>>> around 5.6 seconds without it. For a modified version of that test 
>>>>> that only executes 128 handshakes between the 32 threads and avoids 
>>>>> any suspend-resume, the test takes around 12ms with direct 
>>>>> handshakes and 19ms without it.
>>>>> Tested with mach5, tiers1-6.
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>>>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>>>>
>>>> src/hotspot/share/runtime/handshake.hpp
>>>>
>>>>  36 // A handshake closure is a callback that is executed for each 
>>>> JavaThread
>>>>   37 // while that thread is in a safepoint safe state. The callback 
>>>> is executed
>>>>   38 // either by the thread itself or by the VM thread while 
>>>> keeping the thread
>>>>   39 // in a blocked state. A handshake can be performed with a single
>>>>   40 // JavaThread as well.
>>>>
>>>> Does this comment block need updating for the direct case?
>>>>
>>>>  61 // the operation is only done by either VM thread on behalf of 
>>>> the JavaThread
>>>>  62 // or the JavaThread itself.
>>>>
>>>> Again does this need an update?
>>> Yes, I forgot to update those comments in handshake.hpp. Fixed.
>>>
>>>
>>>> src/hotspot/share/runtime/handshake.hpp
>>>>
>>>>  41 class HandshakeOperation: public StackObj {
>>>>
>>>> Would it be clearer to have HandShakeOperation subclassed by 
>>>> DirectHandShakeOperation, rather than it being a property of the 
>>>> current op instance?
>>> But I would still need to access the is_direct field for non-direct 
>>> handshakes in HandshakeState::try_process() anyways.
>>
>> Okay.
>>
>>>
>>>>  41 class HandshakeOperation: public StackObj {
>>>>  42   HandshakeClosure* _handshake_cl;
>>>>  43   int64_t _pending_threads;
>>>>  ...
>>>>  53   void add_target_count(int count) { 
>>>> Atomic::add(&_pending_threads, count); }
>>>>
>>>> Can you clarify the lifecycle of this StackObj please. You obviously 
>>>> expose it to other threads - hence the atomic update - but that 
>>>> means it needs to be guaranteed to be live when that update occurs.
>>> The HandshakeOperation object is created in Handshake::execute() (or 
>>> Handshake::execute_direct() now) by the Handshaker. The only other 
>>> threads that could access this object are the VMThread(for non-direct 
>>> case only) and the Handshakee(in both cases). Seeing that it's safe 
>>> for the VMThread to decrement the counter is straightforward since 
>>> the Handshaker will be waiting on the VMOperationRequest_lock waiting 
>>> for the operation to finish (this is just as always has been).
>>> As for the Handshakee, we can see that the HandshakeOperation object 
>>> will be alive as long as the VMThread(for the non-direct case) or 
>>> Handshaker(for the direct case) keep seeing that the operation is not 
>>> done by calling HandshakeOperation::is_completed(). Once that returns 
>>> true, for the non-direct case the VMThread will finish the operation 
>>> and wake up the Handshaker who will return from Handshake::execute() 
>>> and the object will be destroyed, and for the direct case the 
>>> Handshaker will just return from execute_direct() and the object will 
>>> be destroyed. Since is_completed() will only return true when 
>>> _pending_threads reaches zero, that means the decrement operation had 
>>> to be safe. Just as an observation, for the HandshakeAllThreads 
>>> operation, _pending_threads could go to zero and even negative before 
>>> executing add_target_count(), but that's okay because the VMThread 
>>> doesn't call is_completed() before that and the load of 
>>> _pending_threads cannot float above add_target_count() since the 
>>> atomic add operation provides a memory fence.
>>> And finally the Handshakee cannot try to execute an operation that 
>>> has already being processed by the VMThread/Handshaker, because they 
>>> clear it first and then signal the _processing_sem semaphore in 
>>> HandshakeState::try_process(). That worked the same before this change.
>>
>> Thanks for the detailed explanation!
>>
>>>
>>>> -- 
>>>>
>>>>  349     HandshakeOperation* op = Atomic::load_acquire(&_operation);
>>>>  350     if (op == NULL) {
>>>>  351       op = Atomic::load_acquire(&_operation_direct);
>>>>
>>>> This gives preference to the non-direct op. Is it possible for the 
>>>> direct-op to be indefinitely delayed if there is a series of 
>>>> non-direct ops?
>>> Yes, it's possible, although I think that's an unlikely scenario. 
>>> That would mean the Handshakee would be stuck in that while loop with 
>>> the VMThread constantly setting new operations in _operation. But I 
>>> can modify the loop and make it try to execute both before going into 
>>> the next interation, something like (more on load_acquire next):
>>>
>>>      if (has_operation()) {
>>>        HandleMark hm(_thread);
>>>        CautiouslyPreserveExceptionMark pem(_thread);
>>>        HandshakeOperation * op = _operation;
>>>        if (op != NULL) {
>>>          // Disarm before execute the operation
>>>          clear_handshake(false);
>>>          op->do_handshake(_thread);
>>>        }
>>>        op = _operation_direct;
>>>        if (op != NULL) {
>>>          // Disarm before execute the operation
>>>          clear_handshake(true);
>>>          op->do_handshake(_thread);
>>>        }
>>>      }
>>>
>>> what do you think?
>>
>> That seems better to me.
>>
>>>
>>>> Also I don't see the release_stores that these load_acquire would 
>>>> pair with. I'm assuming it should be here:
>>>>
>>>>  319     _operation = op;
>>>>  323     _operation_direct = op;
>>>>
>>>> But I would also have expected all necessary memory synchronization 
>>>> to already be present via the semaphore operations and/or the rest 
>>>> of the handshake mechanism. ??
>>> Yes, the release is in SafepointMechanism::arm_local_poll_release() 
>>> after setting those fields. But it's true that those load_acquire are 
>>> not needed since the semaphore already has acquire semantics. I 
>>> actually wanted to do a normal load but ended up just copying how we 
>>> were loading _operation. I''ll change them and retest.
>>
>> Okay.
>>
>> Thanks,
>> David
>> -------
>>
>>>
>>>> 396 if ((!is_direct && _operation != NULL) || (is_direct && 
>>>> _operation_direct != NULL)){
>>>> 406   if ((!is_direct && _operation == NULL) || (is_direct && 
>>>> _operation_direct == NULL)){
>>>>
>>>> Can this not be abstracted back into a "has_operation" method that 
>>>> takes a "direct" parameter? e.g
>>>>
>>>>   bool has_operation() const { return _operation != NULL || 
>>>> _operation_direct != NULL; }
>>>> + bool has_specific_operation(bool direct) {
>>>> +   return direct ? _operation_direct != NULL : _operation != NULL;
>>>> + }
>>> Done.
>>>
>>>
>>>> ---
>>>>
>>>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>>>>
>>>> Please add:
>>>>
>>>> import java.util.concurrent.Semaphore;
>>>>
>>>> so you can just refer to Sempahore.
>>> Done.
>>>
>>>
>>>> Style nit: _working_threads
>>>>
>>>> Java style is to not use leading underscore and to use camelCase for 
>>>> variables ie workingThreads.
>>> Right, I changed them to camelCase.
>>>
>>>
>>>> 44 static Thread _suspendresume_thread = new Thread();
>>>>
>>>> The above is dead code.
>>> Removed.
>>>
>>>
>>>> 53 if (_is_biased[me] == false) {
>>>>
>>>> Style nit: use "if (!_is_biased[me]) {"
>>> Changed.
>>>
>>>
>>>> 80             } catch(InterruptedException ie) {
>>>>   81             }
>>>>
>>>> I suggest inserting "throw new Error("Unexpected interrupt");" for 
>>>> good measure.
>>>>
>>>>  111                     _working_threads[i].suspend();
>>> Done.
>>>
>>>
>>>> Thread suspension has been deprecated-for-removal as of JDK 14 so 
>>>> could be gone in JDK 15 (but more likely 16). If the suspend/resume 
>>>> is important for this test then you will need to switch to using JVM 
>>>> TI suspend/resume; or else perhaps introduce a WhiteBox method to do 
>>>> whatever you need in the VM.
>>> It's not really needed for this test. I just wanted to mixed 
>>> suspend-resume with handshakes because I know there could be some 
>>> buggy interactions between them. But since it has been deprecated 
>>> then maybe there is no point in stressing handshakes with them and I 
>>> can remove that part.
>>>
>>>
>>>>  123 // Wait until the desired number of direct handshakes is reached
>>>>  124         while (_handshake_count.get() < DIRECT_HANDSHAKES_MARK) {
>>>>  125             Thread.sleep(10); // sleep for 10ms
>>>>  126         }
>>>>
>>>> You could just do a join() on one of the worker threads.
>>> Right, didn't thought about that. Changed.
>>>
>>>
>>> Thanks for looking at this David! I'll retest with the above changes 
>>> in the meantime.
>>>
>>>
>>> Patricio
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> Thanks!
>>>>> Patricio
>>>>>
>>>
>