RFR 8230594: Allow direct handshakes without VMThread intervention

Tue Jan 21 16:53:51 UTC 2020

Hi Robbin,

On 1/21/20 7:51 AM, Robbin Ehn wrote:
> Hi Patricio,
>
> On 1/18/20 12:06 AM, Patricio Chilano wrote:
>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v04/webrev/
>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v04/inc/webrev/
>
> As discussed offline, using vtable instead of branches simplifies the 
> code.
> Especially when we introduce per thread queues and more operation 
> types. But
> since that requires a lot of small changes, we skip that here.
Ok, we can add virtual functions later.

> A side from that the only thing that really bugs me is:
>          for (int i = 0; i < WORKING_THREADS; i++) {
>              workingThreads[i] = new Thread(test, Integer.toString(i));
> ->           workingThreads[i].setDaemon(true);
>              workingThreads[i].start();
>          }
>
> It's confusing when you later join these threads before exiting.
Sorry, that setDaemon(true) should have been removed after I did join() 
with all workers. Removed.

> Also consider running the test with more options, maybe a second run 
> like:
>
> diff -r be98066409e2 
> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
> --- a/test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java 
> Tue Jan 21 11:21:13 2020 +0100
> +++ b/test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java 
> Tue Jan 21 11:49:07 2020 +0100
> @@ -30,2 +30,3 @@
>   * @run main/othervm -XX:+UnlockDiagnosticVMOptions 
> -XX:+SafepointALot -XX:BiasedLockingDecayTime=100000000 
> -XX:BiasedLockingBulkRebiasThreshold=1000000 
> -XX:BiasedLockingBulkRevokeThreshold=1000000 HandshakeDirectTest
> + * @run main/othervm -XX:+UnlockDiagnosticVMOptions 
> -XX:GuaranteedSafepointInterval=10 -XX:+HandshakeALot 
> -XX:+SafepointALot -XX:BiasedLockingDecayTime=100000000 
> -XX:BiasedLockingBulkRebiasThreshold=1000000 
> -XX:BiasedLockingBulkRevokeThreshold=1000000 HandshakeDirectTest
>   */
Done.

Here is v5, which contains also style changes mentioned by Dan in his 
last review:

Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v05/webrev/
Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v05/inc/webrev/

Thanks!

Patricio
> Thanks for fixing, Robbin
>
>>
>> Thanks!
>>
>> Patricio
>>> Dan
>>>
>>>
>>> On 1/15/20 10:56 PM, Patricio Chilano wrote:
>>>> Hi David,
>>>>
>>>> On 1/15/20 8:29 PM, David Holmes wrote:
>>>>> Hi Patricio,
>>>>>
>>>>> On 16/01/2020 6:06 am, Patricio Chilano wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> Here is v2:
>>>>>>
>>>>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/webrev/
>>>>>> Inc: 
>>>>>> http://cr.openjdk.java.net/~pchilanomate/8230594/v02/inc/webrev/
>>>>>
>>>>> src/hotspot/share/runtime/handshake.cpp
>>>>>
>>>>> why is it that execute_direct doesn't check for 
>>>>> uses_thread_local_poll() ?
>>>> Sorry, I thought for a moment that thread-local handshakes for 
>>>> arm32 was already implemented and all platforms where using it. I 
>>>> added that check back, and by testing that path I had to also add 
>>>> back the VMThread in the assert in walk_stack_and_revoke().
>>>>
>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>
>>>>> typo: wether -> whether
>>>> Fixed.
>>>>
>>>> Here is v3:
>>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/
>>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/inc/webrev/
>>>>
>>>> Thanks!
>>>>
>>>> Patricio
>>>>> Otherwise updates look fine.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Passed one round of t1-6, and running more rounds now.
>>>>>>
>>>>>> Thanks,
>>>>>> Patricio
>>>>>>
>>>>>> On 1/14/20 7:57 PM, David Holmes wrote:
>>>>>>> Hi Patricio,
>>>>>>>
>>>>>>> On 15/01/2020 5:17 am, Patricio Chilano wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> On 1/14/20 4:13 AM, David Holmes wrote:
>>>>>>>>> Hi Patricio,
>>>>>>>>>
>>>>>>>>> Thanks for tackling this. I've taken an initial look through 
>>>>>>>>> trying to understand the general approach, but I don't claim 
>>>>>>>>> to understand all the subtle details, so there are some 
>>>>>>>>> queries below to help with my overall understanding.
>>>>>>>>>
>>>>>>>>> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> The following patch adds the ability to execute direct 
>>>>>>>>>> handshakes between JavaThreads without the VMThread 
>>>>>>>>>> intervention, and enables this functionality for biased 
>>>>>>>>>> locking revocations.
>>>>>>>>>> The current handshake mechanism that uses the VMThread, 
>>>>>>>>>> either to handshake one JavaThread or all of them, is still 
>>>>>>>>>> the default unless you specify otherwise when calling 
>>>>>>>>>> Handshake::execute().
>>>>>>>>>
>>>>>>>>> Can I suggest that rather than using an overloaded:
>>>>>>>>>
>>>>>>>>>  static bool execute(HandshakeClosure* hs_cl, JavaThread* 
>>>>>>>>> target, bool is_direct_handshake = false);
>>>>>>>>>
>>>>>>>>> you instead add an execute_direct method for clarity?
>>>>>>>> Ok, added execute_direct() and removed overloading.
>>>>>>>>
>>>>>>>>>> In order to avoid adding additional overhead to this path 
>>>>>>>>>> that uses the VMThread (especially the one that handshakes 
>>>>>>>>>> all JavaThreads) I added a new HandshakeOperation pointer in 
>>>>>>>>>> the HandshakeState class, _operation_direct, to be used for 
>>>>>>>>>> the direct handshake cases only and whose access is 
>>>>>>>>>> serialized between JavaThreads by using a semaphore. Thus, 
>>>>>>>>>> one direct handshake will be allowed at any given time, and 
>>>>>>>>>> upon completion the semaphore will be signaled to allow the 
>>>>>>>>>> next handshaker if any to proceed. In this way the old 
>>>>>>>>>> _operation can still be used only by the VMThread without the 
>>>>>>>>>> need for synchronization to access it. The handshakee will 
>>>>>>>>>> now check if any of _operation or _operation_direct is set 
>>>>>>>>>> when checking for a pending handshake and will try to execute 
>>>>>>>>>> both in HandshakeState::process_self_inner(). The execution 
>>>>>>>>>> of the handshake’s ThreadClosure, either direct handshake or 
>>>>>>>>>> not, is still protected by a semaphore, which I renamed to 
>>>>>>>>>> _processing_sem.
>>>>>>>>>> I converted the semaphore _done in HandshakeOperation to be 
>>>>>>>>>> just an atomic counter because of bug 
>>>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which 
>>>>>>>>>> I actually hit once!). Since the semaphore could not be 
>>>>>>>>>> static anymore due to possibly having more than one 
>>>>>>>>>> HandshakeOperation at a time, the handshakee could try to 
>>>>>>>>>> access the nwaiters field of an already destroyed semaphore 
>>>>>>>>>> when signaling it. In any case nobody was waiting on that 
>>>>>>>>>> semaphore (we were not using kernel functionality), so just 
>>>>>>>>>> using an atomic counter seems more appropriate.
>>>>>>>>>> In order to avoid issues due to disarming a JavaThread that 
>>>>>>>>>> should still be armed for a handshake or safepoint, each 
>>>>>>>>>> JavaThread will now always disarm its own polling page.
>>>>>>>>>> I also added a new test, HandshakeDirectTest.java, which 
>>>>>>>>>> tries to stress the use of direct handshakes with revocations.
>>>>>>>>>> In terms of performance, I measured no difference in the 
>>>>>>>>>> execution time of one individual handshake. The difference 
>>>>>>>>>> can be seen when several handshakes at a time are executed as 
>>>>>>>>>> expected. So for example on Linux running on an Intel Xeon 
>>>>>>>>>> 8167M cpu, test HandshakeDirectTest.java (which executes 
>>>>>>>>>> 50000 handshakes between 32 threads) executes in around 340ms 
>>>>>>>>>> using direct handshakes and in around 5.6 seconds without it. 
>>>>>>>>>> For a modified version of that test that only executes 128 
>>>>>>>>>> handshakes between the 32 threads and avoids any 
>>>>>>>>>> suspend-resume, the test takes around 12ms with direct 
>>>>>>>>>> handshakes and 19ms without it.
>>>>>>>>>> Tested with mach5, tiers1-6.
>>>>>>>>>>
>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>>>>>>>>>> Webrev: 
>>>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>>>>>>>>>
>>>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>>>
>>>>>>>>>  36 // A handshake closure is a callback that is executed for 
>>>>>>>>> each JavaThread
>>>>>>>>>   37 // while that thread is in a safepoint safe state. The 
>>>>>>>>> callback is executed
>>>>>>>>>   38 // either by the thread itself or by the VM thread while 
>>>>>>>>> keeping the thread
>>>>>>>>>   39 // in a blocked state. A handshake can be performed with 
>>>>>>>>> a single
>>>>>>>>>   40 // JavaThread as well.
>>>>>>>>>
>>>>>>>>> Does this comment block need updating for the direct case?
>>>>>>>>>
>>>>>>>>>  61 // the operation is only done by either VM thread on 
>>>>>>>>> behalf of the JavaThread
>>>>>>>>>  62 // or the JavaThread itself.
>>>>>>>>>
>>>>>>>>> Again does this need an update?
>>>>>>>> Yes, I forgot to update those comments in handshake.hpp. Fixed.
>>>>>>>>
>>>>>>>>
>>>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>>>
>>>>>>>>>  41 class HandshakeOperation: public StackObj {
>>>>>>>>>
>>>>>>>>> Would it be clearer to have HandShakeOperation subclassed by 
>>>>>>>>> DirectHandShakeOperation, rather than it being a property of 
>>>>>>>>> the current op instance?
>>>>>>>> But I would still need to access the is_direct field for 
>>>>>>>> non-direct handshakes in HandshakeState::try_process() anyways.
>>>>>>>
>>>>>>> Okay.
>>>>>>>
>>>>>>>>
>>>>>>>>>  41 class HandshakeOperation: public StackObj {
>>>>>>>>>  42   HandshakeClosure* _handshake_cl;
>>>>>>>>>  43   int64_t _pending_threads;
>>>>>>>>>  ...
>>>>>>>>>  53   void add_target_count(int count) { 
>>>>>>>>> Atomic::add(&_pending_threads, count); }
>>>>>>>>>
>>>>>>>>> Can you clarify the lifecycle of this StackObj please. You 
>>>>>>>>> obviously expose it to other threads - hence the atomic update 
>>>>>>>>> - but that means it needs to be guaranteed to be live when 
>>>>>>>>> that update occurs.
>>>>>>>> The HandshakeOperation object is created in 
>>>>>>>> Handshake::execute() (or Handshake::execute_direct() now) by 
>>>>>>>> the Handshaker. The only other threads that could access this 
>>>>>>>> object are the VMThread(for non-direct case only) and the 
>>>>>>>> Handshakee(in both cases). Seeing that it's safe for the 
>>>>>>>> VMThread to decrement the counter is straightforward since the 
>>>>>>>> Handshaker will be waiting on the VMOperationRequest_lock 
>>>>>>>> waiting for the operation to finish (this is just as always has 
>>>>>>>> been).
>>>>>>>> As for the Handshakee, we can see that the HandshakeOperation 
>>>>>>>> object will be alive as long as the VMThread(for the non-direct 
>>>>>>>> case) or Handshaker(for the direct case) keep seeing that the 
>>>>>>>> operation is not done by calling 
>>>>>>>> HandshakeOperation::is_completed(). Once that returns true, for 
>>>>>>>> the non-direct case the VMThread will finish the operation and 
>>>>>>>> wake up the Handshaker who will return from 
>>>>>>>> Handshake::execute() and the object will be destroyed, and for 
>>>>>>>> the direct case the Handshaker will just return from 
>>>>>>>> execute_direct() and the object will be destroyed. Since 
>>>>>>>> is_completed() will only return true when _pending_threads 
>>>>>>>> reaches zero, that means the decrement operation had to be 
>>>>>>>> safe. Just as an observation, for the HandshakeAllThreads 
>>>>>>>> operation, _pending_threads could go to zero and even negative 
>>>>>>>> before executing add_target_count(), but that's okay because 
>>>>>>>> the VMThread doesn't call is_completed() before that and the 
>>>>>>>> load of _pending_threads cannot float above add_target_count() 
>>>>>>>> since the atomic add operation provides a memory fence.
>>>>>>>> And finally the Handshakee cannot try to execute an operation 
>>>>>>>> that has already being processed by the VMThread/Handshaker, 
>>>>>>>> because they clear it first and then signal the _processing_sem 
>>>>>>>> semaphore in HandshakeState::try_process(). That worked the 
>>>>>>>> same before this change.
>>>>>>>
>>>>>>> Thanks for the detailed explanation!
>>>>>>>
>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>>
>>>>>>>>>  349     HandshakeOperation* op = 
>>>>>>>>> Atomic::load_acquire(&_operation);
>>>>>>>>>  350     if (op == NULL) {
>>>>>>>>>  351       op = Atomic::load_acquire(&_operation_direct);
>>>>>>>>>
>>>>>>>>> This gives preference to the non-direct op. Is it possible for 
>>>>>>>>> the direct-op to be indefinitely delayed if there is a series 
>>>>>>>>> of non-direct ops?
>>>>>>>> Yes, it's possible, although I think that's an unlikely 
>>>>>>>> scenario. That would mean the Handshakee would be stuck in that 
>>>>>>>> while loop with the VMThread constantly setting new operations 
>>>>>>>> in _operation. But I can modify the loop and make it try to 
>>>>>>>> execute both before going into the next interation, something 
>>>>>>>> like (more on load_acquire next):
>>>>>>>>
>>>>>>>>      if (has_operation()) {
>>>>>>>>        HandleMark hm(_thread);
>>>>>>>>        CautiouslyPreserveExceptionMark pem(_thread);
>>>>>>>>        HandshakeOperation * op = _operation;
>>>>>>>>        if (op != NULL) {
>>>>>>>>          // Disarm before execute the operation
>>>>>>>>          clear_handshake(false);
>>>>>>>>          op->do_handshake(_thread);
>>>>>>>>        }
>>>>>>>>        op = _operation_direct;
>>>>>>>>        if (op != NULL) {
>>>>>>>>          // Disarm before execute the operation
>>>>>>>>          clear_handshake(true);
>>>>>>>>          op->do_handshake(_thread);
>>>>>>>>        }
>>>>>>>>      }
>>>>>>>>
>>>>>>>> what do you think?
>>>>>>>
>>>>>>> That seems better to me.
>>>>>>>
>>>>>>>>
>>>>>>>>> Also I don't see the release_stores that these load_acquire 
>>>>>>>>> would pair with. I'm assuming it should be here:
>>>>>>>>>
>>>>>>>>>  319     _operation = op;
>>>>>>>>>  323     _operation_direct = op;
>>>>>>>>>
>>>>>>>>> But I would also have expected all necessary memory 
>>>>>>>>> synchronization to already be present via the semaphore 
>>>>>>>>> operations and/or the rest of the handshake mechanism. ??
>>>>>>>> Yes, the release is in 
>>>>>>>> SafepointMechanism::arm_local_poll_release() after setting 
>>>>>>>> those fields. But it's true that those load_acquire are not 
>>>>>>>> needed since the semaphore already has acquire semantics. I 
>>>>>>>> actually wanted to do a normal load but ended up just copying 
>>>>>>>> how we were loading _operation. I''ll change them and retest.
>>>>>>>
>>>>>>> Okay.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -------
>>>>>>>
>>>>>>>>
>>>>>>>>> 396 if ((!is_direct && _operation != NULL) || (is_direct && 
>>>>>>>>> _operation_direct != NULL)){
>>>>>>>>> 406   if ((!is_direct && _operation == NULL) || (is_direct && 
>>>>>>>>> _operation_direct == NULL)){
>>>>>>>>>
>>>>>>>>> Can this not be abstracted back into a "has_operation" method 
>>>>>>>>> that takes a "direct" parameter? e.g
>>>>>>>>>
>>>>>>>>>   bool has_operation() const { return _operation != NULL || 
>>>>>>>>> _operation_direct != NULL; }
>>>>>>>>> + bool has_specific_operation(bool direct) {
>>>>>>>>> +   return direct ? _operation_direct != NULL : _operation != 
>>>>>>>>> NULL;
>>>>>>>>> + }
>>>>>>>> Done.
>>>>>>>>
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>>>>>>>>>
>>>>>>>>> Please add:
>>>>>>>>>
>>>>>>>>> import java.util.concurrent.Semaphore;
>>>>>>>>>
>>>>>>>>> so you can just refer to Sempahore.
>>>>>>>> Done.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Style nit: _working_threads
>>>>>>>>>
>>>>>>>>> Java style is to not use leading underscore and to use 
>>>>>>>>> camelCase for variables ie workingThreads.
>>>>>>>> Right, I changed them to camelCase.
>>>>>>>>
>>>>>>>>
>>>>>>>>> 44 static Thread _suspendresume_thread = new Thread();
>>>>>>>>>
>>>>>>>>> The above is dead code.
>>>>>>>> Removed.
>>>>>>>>
>>>>>>>>
>>>>>>>>> 53 if (_is_biased[me] == false) {
>>>>>>>>>
>>>>>>>>> Style nit: use "if (!_is_biased[me]) {"
>>>>>>>> Changed.
>>>>>>>>
>>>>>>>>
>>>>>>>>> 80             } catch(InterruptedException ie) {
>>>>>>>>>   81             }
>>>>>>>>>
>>>>>>>>> I suggest inserting "throw new Error("Unexpected interrupt");" 
>>>>>>>>> for good measure.
>>>>>>>>>
>>>>>>>>>  111 _working_threads[i].suspend();
>>>>>>>> Done.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thread suspension has been deprecated-for-removal as of JDK 14 
>>>>>>>>> so could be gone in JDK 15 (but more likely 16). If the 
>>>>>>>>> suspend/resume is important for this test then you will need 
>>>>>>>>> to switch to using JVM TI suspend/resume; or else perhaps 
>>>>>>>>> introduce a WhiteBox method to do whatever you need in the VM.
>>>>>>>> It's not really needed for this test. I just wanted to mixed 
>>>>>>>> suspend-resume with handshakes because I know there could be 
>>>>>>>> some buggy interactions between them. But since it has been 
>>>>>>>> deprecated then maybe there is no point in stressing handshakes 
>>>>>>>> with them and I can remove that part.
>>>>>>>>
>>>>>>>>
>>>>>>>>>  123 // Wait until the desired number of direct handshakes is 
>>>>>>>>> reached
>>>>>>>>>  124         while (_handshake_count.get() < 
>>>>>>>>> DIRECT_HANDSHAKES_MARK) {
>>>>>>>>>  125             Thread.sleep(10); // sleep for 10ms
>>>>>>>>>  126         }
>>>>>>>>>
>>>>>>>>> You could just do a join() on one of the worker threads.
>>>>>>>> Right, didn't thought about that. Changed.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for looking at this David! I'll retest with the above 
>>>>>>>> changes in the meantime.
>>>>>>>>
>>>>>>>>
>>>>>>>> Patricio
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Patricio
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>