RFR 8230594: Allow direct handshakes without VMThread intervention

Tue Jan 21 10:51:19 UTC 2020

Hi Patricio,

On 1/18/20 12:06 AM, Patricio Chilano wrote:
> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v04/webrev/
> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v04/inc/webrev/

As discussed offline, using vtable instead of branches simplifies the code.
Especially when we introduce per thread queues and more operation types. But
since that requires a lot of small changes, we skip that here.

A side from that the only thing that really bugs me is:
          for (int i = 0; i < WORKING_THREADS; i++) {
              workingThreads[i] = new Thread(test, Integer.toString(i));
->           workingThreads[i].setDaemon(true);
              workingThreads[i].start();
          }

It's confusing when you later join these threads before exiting.

Also consider running the test with more options, maybe a second run like:

diff -r be98066409e2 test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java

--- a/test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java	Tue Jan 21 
11:21:13 2020 +0100
+++ b/test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java	Tue Jan 21 
11:49:07 2020 +0100
@@ -30,2 +30,3 @@
   * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+SafepointALot 
-XX:BiasedLockingDecayTime=100000000 
-XX:BiasedLockingBulkRebiasThreshold=1000000 
-XX:BiasedLockingBulkRevokeThreshold=1000000 HandshakeDirectTest
+ * @run main/othervm -XX:+UnlockDiagnosticVMOptions 
-XX:GuaranteedSafepointInterval=10 -XX:+HandshakeALot -XX:+SafepointALot 
-XX:BiasedLockingDecayTime=100000000 
-XX:BiasedLockingBulkRebiasThreshold=1000000 
-XX:BiasedLockingBulkRevokeThreshold=1000000 HandshakeDirectTest
   */

Thanks for fixing, Robbin

> 
> Thanks!
> 
> Patricio
>> Dan
>>
>>
>> On 1/15/20 10:56 PM, Patricio Chilano wrote:
>>> Hi David,
>>>
>>> On 1/15/20 8:29 PM, David Holmes wrote:
>>>> Hi Patricio,
>>>>
>>>> On 16/01/2020 6:06 am, Patricio Chilano wrote:
>>>>> Hi David,
>>>>>
>>>>> Here is v2:
>>>>>
>>>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/webrev/
>>>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/inc/webrev/
>>>>
>>>> src/hotspot/share/runtime/handshake.cpp
>>>>
>>>> why is it that execute_direct doesn't check for uses_thread_local_poll() ?
>>> Sorry, I thought for a moment that thread-local handshakes for arm32 was 
>>> already implemented and all platforms where using it. I added that check 
>>> back, and by testing that path I had to also add back the VMThread in the 
>>> assert in walk_stack_and_revoke().
>>>
>>>> src/hotspot/share/runtime/handshake.hpp
>>>>
>>>> typo: wether -> whether
>>> Fixed.
>>>
>>> Here is v3:
>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/
>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/inc/webrev/
>>>
>>> Thanks!
>>>
>>> Patricio
>>>> Otherwise updates look fine.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Passed one round of t1-6, and running more rounds now.
>>>>>
>>>>> Thanks,
>>>>> Patricio
>>>>>
>>>>> On 1/14/20 7:57 PM, David Holmes wrote:
>>>>>> Hi Patricio,
>>>>>>
>>>>>> On 15/01/2020 5:17 am, Patricio Chilano wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> On 1/14/20 4:13 AM, David Holmes wrote:
>>>>>>>> Hi Patricio,
>>>>>>>>
>>>>>>>> Thanks for tackling this. I've taken an initial look through trying to 
>>>>>>>> understand the general approach, but I don't claim to understand all the 
>>>>>>>> subtle details, so there are some queries below to help with my overall 
>>>>>>>> understanding.
>>>>>>>>
>>>>>>>> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> The following patch adds the ability to execute direct handshakes 
>>>>>>>>> between JavaThreads without the VMThread intervention, and enables this 
>>>>>>>>> functionality for biased locking revocations.
>>>>>>>>> The current handshake mechanism that uses the VMThread, either to 
>>>>>>>>> handshake one JavaThread or all of them, is still the default unless 
>>>>>>>>> you specify otherwise when calling Handshake::execute().
>>>>>>>>
>>>>>>>> Can I suggest that rather than using an overloaded:
>>>>>>>>
>>>>>>>>  static bool execute(HandshakeClosure* hs_cl, JavaThread* target, bool 
>>>>>>>> is_direct_handshake = false);
>>>>>>>>
>>>>>>>> you instead add an execute_direct method for clarity?
>>>>>>> Ok, added execute_direct() and removed overloading.
>>>>>>>
>>>>>>>>> In order to avoid adding additional overhead to this path that uses the 
>>>>>>>>> VMThread (especially the one that handshakes all JavaThreads) I added a 
>>>>>>>>> new HandshakeOperation pointer in the HandshakeState class, 
>>>>>>>>> _operation_direct, to be used for the direct handshake cases only and 
>>>>>>>>> whose access is serialized between JavaThreads by using a semaphore. 
>>>>>>>>> Thus, one direct handshake will be allowed at any given time, and upon 
>>>>>>>>> completion the semaphore will be signaled to allow the next handshaker 
>>>>>>>>> if any to proceed. In this way the old _operation can still be used 
>>>>>>>>> only by the VMThread without the need for synchronization to access it. 
>>>>>>>>> The handshakee will now check if any of _operation or _operation_direct 
>>>>>>>>> is set when checking for a pending handshake and will try to execute 
>>>>>>>>> both in HandshakeState::process_self_inner(). The execution of the 
>>>>>>>>> handshake’s ThreadClosure, either direct handshake or not, is still 
>>>>>>>>> protected by a semaphore, which I renamed to _processing_sem.
>>>>>>>>> I converted the semaphore _done in HandshakeOperation to be just an 
>>>>>>>>> atomic counter because of bug 
>>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which I actually 
>>>>>>>>> hit once!). Since the semaphore could not be static anymore due to 
>>>>>>>>> possibly having more than one HandshakeOperation at a time, the 
>>>>>>>>> handshakee could try to access the nwaiters field of an already 
>>>>>>>>> destroyed semaphore when signaling it. In any case nobody was waiting 
>>>>>>>>> on that semaphore (we were not using kernel functionality), so just 
>>>>>>>>> using an atomic counter seems more appropriate.
>>>>>>>>> In order to avoid issues due to disarming a JavaThread that should 
>>>>>>>>> still be armed for a handshake or safepoint, each JavaThread will now 
>>>>>>>>> always disarm its own polling page.
>>>>>>>>> I also added a new test, HandshakeDirectTest.java, which tries to 
>>>>>>>>> stress the use of direct handshakes with revocations.
>>>>>>>>> In terms of performance, I measured no difference in the execution time 
>>>>>>>>> of one individual handshake. The difference can be seen when several 
>>>>>>>>> handshakes at a time are executed as expected. So for example on Linux 
>>>>>>>>> running on an Intel Xeon 8167M cpu, test HandshakeDirectTest.java 
>>>>>>>>> (which executes 50000 handshakes between 32 threads) executes in around 
>>>>>>>>> 340ms using direct handshakes and in around 5.6 seconds without it. For 
>>>>>>>>> a modified version of that test that only executes 128 handshakes 
>>>>>>>>> between the 32 threads and avoids any suspend-resume, the test takes 
>>>>>>>>> around 12ms with direct handshakes and 19ms without it.
>>>>>>>>> Tested with mach5, tiers1-6.
>>>>>>>>>
>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>>
>>>>>>>>  36 // A handshake closure is a callback that is executed for each 
>>>>>>>> JavaThread
>>>>>>>>   37 // while that thread is in a safepoint safe state. The callback is 
>>>>>>>> executed
>>>>>>>>   38 // either by the thread itself or by the VM thread while keeping 
>>>>>>>> the thread
>>>>>>>>   39 // in a blocked state. A handshake can be performed with a single
>>>>>>>>   40 // JavaThread as well.
>>>>>>>>
>>>>>>>> Does this comment block need updating for the direct case?
>>>>>>>>
>>>>>>>>  61 // the operation is only done by either VM thread on behalf of the 
>>>>>>>> JavaThread
>>>>>>>>  62 // or the JavaThread itself.
>>>>>>>>
>>>>>>>> Again does this need an update?
>>>>>>> Yes, I forgot to update those comments in handshake.hpp. Fixed.
>>>>>>>
>>>>>>>
>>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>>
>>>>>>>>  41 class HandshakeOperation: public StackObj {
>>>>>>>>
>>>>>>>> Would it be clearer to have HandShakeOperation subclassed by 
>>>>>>>> DirectHandShakeOperation, rather than it being a property of the current 
>>>>>>>> op instance?
>>>>>>> But I would still need to access the is_direct field for non-direct 
>>>>>>> handshakes in HandshakeState::try_process() anyways.
>>>>>>
>>>>>> Okay.
>>>>>>
>>>>>>>
>>>>>>>>  41 class HandshakeOperation: public StackObj {
>>>>>>>>  42   HandshakeClosure* _handshake_cl;
>>>>>>>>  43   int64_t _pending_threads;
>>>>>>>>  ...
>>>>>>>>  53   void add_target_count(int count) { Atomic::add(&_pending_threads, 
>>>>>>>> count); }
>>>>>>>>
>>>>>>>> Can you clarify the lifecycle of this StackObj please. You obviously 
>>>>>>>> expose it to other threads - hence the atomic update - but that means it 
>>>>>>>> needs to be guaranteed to be live when that update occurs.
>>>>>>> The HandshakeOperation object is created in Handshake::execute() (or 
>>>>>>> Handshake::execute_direct() now) by the Handshaker. The only other 
>>>>>>> threads that could access this object are the VMThread(for non-direct 
>>>>>>> case only) and the Handshakee(in both cases). Seeing that it's safe for 
>>>>>>> the VMThread to decrement the counter is straightforward since the 
>>>>>>> Handshaker will be waiting on the VMOperationRequest_lock waiting for the 
>>>>>>> operation to finish (this is just as always has been).
>>>>>>> As for the Handshakee, we can see that the HandshakeOperation object will 
>>>>>>> be alive as long as the VMThread(for the non-direct case) or 
>>>>>>> Handshaker(for the direct case) keep seeing that the operation is not 
>>>>>>> done by calling HandshakeOperation::is_completed(). Once that returns 
>>>>>>> true, for the non-direct case the VMThread will finish the operation and 
>>>>>>> wake up the Handshaker who will return from Handshake::execute() and the 
>>>>>>> object will be destroyed, and for the direct case the Handshaker will 
>>>>>>> just return from execute_direct() and the object will be destroyed. Since 
>>>>>>> is_completed() will only return true when _pending_threads reaches zero, 
>>>>>>> that means the decrement operation had to be safe. Just as an 
>>>>>>> observation, for the HandshakeAllThreads operation, _pending_threads 
>>>>>>> could go to zero and even negative before executing add_target_count(), 
>>>>>>> but that's okay because the VMThread doesn't call is_completed() before 
>>>>>>> that and the load of _pending_threads cannot float above 
>>>>>>> add_target_count() since the atomic add operation provides a memory fence.
>>>>>>> And finally the Handshakee cannot try to execute an operation that has 
>>>>>>> already being processed by the VMThread/Handshaker, because they clear it 
>>>>>>> first and then signal the _processing_sem semaphore in 
>>>>>>> HandshakeState::try_process(). That worked the same before this change.
>>>>>>
>>>>>> Thanks for the detailed explanation!
>>>>>>
>>>>>>>
>>>>>>>> -- 
>>>>>>>>
>>>>>>>>  349     HandshakeOperation* op = Atomic::load_acquire(&_operation);
>>>>>>>>  350     if (op == NULL) {
>>>>>>>>  351       op = Atomic::load_acquire(&_operation_direct);
>>>>>>>>
>>>>>>>> This gives preference to the non-direct op. Is it possible for the 
>>>>>>>> direct-op to be indefinitely delayed if there is a series of non-direct 
>>>>>>>> ops?
>>>>>>> Yes, it's possible, although I think that's an unlikely scenario. That 
>>>>>>> would mean the Handshakee would be stuck in that while loop with the 
>>>>>>> VMThread constantly setting new operations in _operation. But I can 
>>>>>>> modify the loop and make it try to execute both before going into the 
>>>>>>> next interation, something like (more on load_acquire next):
>>>>>>>
>>>>>>>      if (has_operation()) {
>>>>>>>        HandleMark hm(_thread);
>>>>>>>        CautiouslyPreserveExceptionMark pem(_thread);
>>>>>>>        HandshakeOperation * op = _operation;
>>>>>>>        if (op != NULL) {
>>>>>>>          // Disarm before execute the operation
>>>>>>>          clear_handshake(false);
>>>>>>>          op->do_handshake(_thread);
>>>>>>>        }
>>>>>>>        op = _operation_direct;
>>>>>>>        if (op != NULL) {
>>>>>>>          // Disarm before execute the operation
>>>>>>>          clear_handshake(true);
>>>>>>>          op->do_handshake(_thread);
>>>>>>>        }
>>>>>>>      }
>>>>>>>
>>>>>>> what do you think?
>>>>>>
>>>>>> That seems better to me.
>>>>>>
>>>>>>>
>>>>>>>> Also I don't see the release_stores that these load_acquire would pair 
>>>>>>>> with. I'm assuming it should be here:
>>>>>>>>
>>>>>>>>  319     _operation = op;
>>>>>>>>  323     _operation_direct = op;
>>>>>>>>
>>>>>>>> But I would also have expected all necessary memory synchronization to 
>>>>>>>> already be present via the semaphore operations and/or the rest of the 
>>>>>>>> handshake mechanism. ??
>>>>>>> Yes, the release is in SafepointMechanism::arm_local_poll_release() after 
>>>>>>> setting those fields. But it's true that those load_acquire are not 
>>>>>>> needed since the semaphore already has acquire semantics. I actually 
>>>>>>> wanted to do a normal load but ended up just copying how we were loading 
>>>>>>> _operation. I''ll change them and retest.
>>>>>>
>>>>>> Okay.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -------
>>>>>>
>>>>>>>
>>>>>>>> 396 if ((!is_direct && _operation != NULL) || (is_direct && 
>>>>>>>> _operation_direct != NULL)){
>>>>>>>> 406   if ((!is_direct && _operation == NULL) || (is_direct && 
>>>>>>>> _operation_direct == NULL)){
>>>>>>>>
>>>>>>>> Can this not be abstracted back into a "has_operation" method that takes 
>>>>>>>> a "direct" parameter? e.g
>>>>>>>>
>>>>>>>>   bool has_operation() const { return _operation != NULL || 
>>>>>>>> _operation_direct != NULL; }
>>>>>>>> + bool has_specific_operation(bool direct) {
>>>>>>>> +   return direct ? _operation_direct != NULL : _operation != NULL;
>>>>>>>> + }
>>>>>>> Done.
>>>>>>>
>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>>>>>>>>
>>>>>>>> Please add:
>>>>>>>>
>>>>>>>> import java.util.concurrent.Semaphore;
>>>>>>>>
>>>>>>>> so you can just refer to Sempahore.
>>>>>>> Done.
>>>>>>>
>>>>>>>
>>>>>>>> Style nit: _working_threads
>>>>>>>>
>>>>>>>> Java style is to not use leading underscore and to use camelCase for 
>>>>>>>> variables ie workingThreads.
>>>>>>> Right, I changed them to camelCase.
>>>>>>>
>>>>>>>
>>>>>>>> 44 static Thread _suspendresume_thread = new Thread();
>>>>>>>>
>>>>>>>> The above is dead code.
>>>>>>> Removed.
>>>>>>>
>>>>>>>
>>>>>>>> 53 if (_is_biased[me] == false) {
>>>>>>>>
>>>>>>>> Style nit: use "if (!_is_biased[me]) {"
>>>>>>> Changed.
>>>>>>>
>>>>>>>
>>>>>>>> 80             } catch(InterruptedException ie) {
>>>>>>>>   81             }
>>>>>>>>
>>>>>>>> I suggest inserting "throw new Error("Unexpected interrupt");" for good 
>>>>>>>> measure.
>>>>>>>>
>>>>>>>>  111 _working_threads[i].suspend();
>>>>>>> Done.
>>>>>>>
>>>>>>>
>>>>>>>> Thread suspension has been deprecated-for-removal as of JDK 14 so could 
>>>>>>>> be gone in JDK 15 (but more likely 16). If the suspend/resume is 
>>>>>>>> important for this test then you will need to switch to using JVM TI 
>>>>>>>> suspend/resume; or else perhaps introduce a WhiteBox method to do 
>>>>>>>> whatever you need in the VM.
>>>>>>> It's not really needed for this test. I just wanted to mixed 
>>>>>>> suspend-resume with handshakes because I know there could be some buggy 
>>>>>>> interactions between them. But since it has been deprecated then maybe 
>>>>>>> there is no point in stressing handshakes with them and I can remove that 
>>>>>>> part.
>>>>>>>
>>>>>>>
>>>>>>>>  123 // Wait until the desired number of direct handshakes is reached
>>>>>>>>  124         while (_handshake_count.get() < DIRECT_HANDSHAKES_MARK) {
>>>>>>>>  125             Thread.sleep(10); // sleep for 10ms
>>>>>>>>  126         }
>>>>>>>>
>>>>>>>> You could just do a join() on one of the worker threads.
>>>>>>> Right, didn't thought about that. Changed.
>>>>>>>
>>>>>>>
>>>>>>> Thanks for looking at this David! I'll retest with the above changes in 
>>>>>>> the meantime.
>>>>>>>
>>>>>>>
>>>>>>> Patricio
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Patricio
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>