RFR 8230594: Allow direct handshakes without VMThread intervention

Daniel D. Daugherty daniel.daugherty at oracle.com
Tue Jan 21 17:00:37 UTC 2020


On 1/21/20 11:53 AM, Patricio Chilano wrote:
> Hi Robbin,
>
> On 1/21/20 7:51 AM, Robbin Ehn wrote:
>> Hi Patricio,
>>
>> On 1/18/20 12:06 AM, Patricio Chilano wrote:
>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v04/webrev/
>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v04/inc/webrev/
>>
>> As discussed offline, using vtable instead of branches simplifies the 
>> code.
>> Especially when we introduce per thread queues and more operation 
>> types. But
>> since that requires a lot of small changes, we skip that here.
> Ok, we can add virtual functions later.
>
>> A side from that the only thing that really bugs me is:
>>          for (int i = 0; i < WORKING_THREADS; i++) {
>>              workingThreads[i] = new Thread(test, Integer.toString(i));
>> ->           workingThreads[i].setDaemon(true);
>>              workingThreads[i].start();
>>          }
>>
>> It's confusing when you later join these threads before exiting.
> Sorry, that setDaemon(true) should have been removed after I did 
> join() with all workers. Removed.
>
>> Also consider running the test with more options, maybe a second run 
>> like:
>>
>> diff -r be98066409e2 
>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>> --- a/test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java 
>> Tue Jan 21 11:21:13 2020 +0100
>> +++ b/test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java 
>> Tue Jan 21 11:49:07 2020 +0100
>> @@ -30,2 +30,3 @@
>>   * @run main/othervm -XX:+UnlockDiagnosticVMOptions 
>> -XX:+SafepointALot -XX:BiasedLockingDecayTime=100000000 
>> -XX:BiasedLockingBulkRebiasThreshold=1000000 
>> -XX:BiasedLockingBulkRevokeThreshold=1000000 HandshakeDirectTest
>> + * @run main/othervm -XX:+UnlockDiagnosticVMOptions 
>> -XX:GuaranteedSafepointInterval=10 -XX:+HandshakeALot 
>> -XX:+SafepointALot -XX:BiasedLockingDecayTime=100000000 
>> -XX:BiasedLockingBulkRebiasThreshold=1000000 
>> -XX:BiasedLockingBulkRevokeThreshold=1000000 HandshakeDirectTest
>>   */
> Done.
>
> Here is v5, which contains also style changes mentioned by Dan in his 
> last review:
>
> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v05/webrev/
> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v05/inc/webrev/

src/hotspot/share/runtime/handshake.cpp
     No comments.

test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
     No comments.

Thumbs up.

Dan


>
> Thanks!
>
> Patricio
>> Thanks for fixing, Robbin
>>
>>>
>>> Thanks!
>>>
>>> Patricio
>>>> Dan
>>>>
>>>>
>>>> On 1/15/20 10:56 PM, Patricio Chilano wrote:
>>>>> Hi David,
>>>>>
>>>>> On 1/15/20 8:29 PM, David Holmes wrote:
>>>>>> Hi Patricio,
>>>>>>
>>>>>> On 16/01/2020 6:06 am, Patricio Chilano wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Here is v2:
>>>>>>>
>>>>>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/webrev/
>>>>>>> Inc: 
>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8230594/v02/inc/webrev/
>>>>>>
>>>>>> src/hotspot/share/runtime/handshake.cpp
>>>>>>
>>>>>> why is it that execute_direct doesn't check for 
>>>>>> uses_thread_local_poll() ?
>>>>> Sorry, I thought for a moment that thread-local handshakes for 
>>>>> arm32 was already implemented and all platforms where using it. I 
>>>>> added that check back, and by testing that path I had to also add 
>>>>> back the VMThread in the assert in walk_stack_and_revoke().
>>>>>
>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>
>>>>>> typo: wether -> whether
>>>>> Fixed.
>>>>>
>>>>> Here is v3:
>>>>> Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/webrev/
>>>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v03/inc/webrev/
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Patricio
>>>>>> Otherwise updates look fine.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Passed one round of t1-6, and running more rounds now.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Patricio
>>>>>>>
>>>>>>> On 1/14/20 7:57 PM, David Holmes wrote:
>>>>>>>> Hi Patricio,
>>>>>>>>
>>>>>>>> On 15/01/2020 5:17 am, Patricio Chilano wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> On 1/14/20 4:13 AM, David Holmes wrote:
>>>>>>>>>> Hi Patricio,
>>>>>>>>>>
>>>>>>>>>> Thanks for tackling this. I've taken an initial look through 
>>>>>>>>>> trying to understand the general approach, but I don't claim 
>>>>>>>>>> to understand all the subtle details, so there are some 
>>>>>>>>>> queries below to help with my overall understanding.
>>>>>>>>>>
>>>>>>>>>> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> The following patch adds the ability to execute direct 
>>>>>>>>>>> handshakes between JavaThreads without the VMThread 
>>>>>>>>>>> intervention, and enables this functionality for biased 
>>>>>>>>>>> locking revocations.
>>>>>>>>>>> The current handshake mechanism that uses the VMThread, 
>>>>>>>>>>> either to handshake one JavaThread or all of them, is still 
>>>>>>>>>>> the default unless you specify otherwise when calling 
>>>>>>>>>>> Handshake::execute().
>>>>>>>>>>
>>>>>>>>>> Can I suggest that rather than using an overloaded:
>>>>>>>>>>
>>>>>>>>>>  static bool execute(HandshakeClosure* hs_cl, JavaThread* 
>>>>>>>>>> target, bool is_direct_handshake = false);
>>>>>>>>>>
>>>>>>>>>> you instead add an execute_direct method for clarity?
>>>>>>>>> Ok, added execute_direct() and removed overloading.
>>>>>>>>>
>>>>>>>>>>> In order to avoid adding additional overhead to this path 
>>>>>>>>>>> that uses the VMThread (especially the one that handshakes 
>>>>>>>>>>> all JavaThreads) I added a new HandshakeOperation pointer in 
>>>>>>>>>>> the HandshakeState class, _operation_direct, to be used for 
>>>>>>>>>>> the direct handshake cases only and whose access is 
>>>>>>>>>>> serialized between JavaThreads by using a semaphore. Thus, 
>>>>>>>>>>> one direct handshake will be allowed at any given time, and 
>>>>>>>>>>> upon completion the semaphore will be signaled to allow the 
>>>>>>>>>>> next handshaker if any to proceed. In this way the old 
>>>>>>>>>>> _operation can still be used only by the VMThread without 
>>>>>>>>>>> the need for synchronization to access it. The handshakee 
>>>>>>>>>>> will now check if any of _operation or _operation_direct is 
>>>>>>>>>>> set when checking for a pending handshake and will try to 
>>>>>>>>>>> execute both in HandshakeState::process_self_inner(). The 
>>>>>>>>>>> execution of the handshake’s ThreadClosure, either direct 
>>>>>>>>>>> handshake or not, is still protected by a semaphore, which I 
>>>>>>>>>>> renamed to _processing_sem.
>>>>>>>>>>> I converted the semaphore _done in HandshakeOperation to be 
>>>>>>>>>>> just an atomic counter because of bug 
>>>>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which 
>>>>>>>>>>> I actually hit once!). Since the semaphore could not be 
>>>>>>>>>>> static anymore due to possibly having more than one 
>>>>>>>>>>> HandshakeOperation at a time, the handshakee could try to 
>>>>>>>>>>> access the nwaiters field of an already destroyed semaphore 
>>>>>>>>>>> when signaling it. In any case nobody was waiting on that 
>>>>>>>>>>> semaphore (we were not using kernel functionality), so just 
>>>>>>>>>>> using an atomic counter seems more appropriate.
>>>>>>>>>>> In order to avoid issues due to disarming a JavaThread that 
>>>>>>>>>>> should still be armed for a handshake or safepoint, each 
>>>>>>>>>>> JavaThread will now always disarm its own polling page.
>>>>>>>>>>> I also added a new test, HandshakeDirectTest.java, which 
>>>>>>>>>>> tries to stress the use of direct handshakes with revocations.
>>>>>>>>>>> In terms of performance, I measured no difference in the 
>>>>>>>>>>> execution time of one individual handshake. The difference 
>>>>>>>>>>> can be seen when several handshakes at a time are executed 
>>>>>>>>>>> as expected. So for example on Linux running on an Intel 
>>>>>>>>>>> Xeon 8167M cpu, test HandshakeDirectTest.java (which 
>>>>>>>>>>> executes 50000 handshakes between 32 threads) executes in 
>>>>>>>>>>> around 340ms using direct handshakes and in around 5.6 
>>>>>>>>>>> seconds without it. For a modified version of that test that 
>>>>>>>>>>> only executes 128 handshakes between the 32 threads and 
>>>>>>>>>>> avoids any suspend-resume, the test takes around 12ms with 
>>>>>>>>>>> direct handshakes and 19ms without it.
>>>>>>>>>>> Tested with mach5, tiers1-6.
>>>>>>>>>>>
>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>>>>>>>>>>> Webrev: 
>>>>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>>>>>>>>>>
>>>>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>>>>
>>>>>>>>>>  36 // A handshake closure is a callback that is executed for 
>>>>>>>>>> each JavaThread
>>>>>>>>>>   37 // while that thread is in a safepoint safe state. The 
>>>>>>>>>> callback is executed
>>>>>>>>>>   38 // either by the thread itself or by the VM thread while 
>>>>>>>>>> keeping the thread
>>>>>>>>>>   39 // in a blocked state. A handshake can be performed with 
>>>>>>>>>> a single
>>>>>>>>>>   40 // JavaThread as well.
>>>>>>>>>>
>>>>>>>>>> Does this comment block need updating for the direct case?
>>>>>>>>>>
>>>>>>>>>>  61 // the operation is only done by either VM thread on 
>>>>>>>>>> behalf of the JavaThread
>>>>>>>>>>  62 // or the JavaThread itself.
>>>>>>>>>>
>>>>>>>>>> Again does this need an update?
>>>>>>>>> Yes, I forgot to update those comments in handshake.hpp. Fixed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> src/hotspot/share/runtime/handshake.hpp
>>>>>>>>>>
>>>>>>>>>>  41 class HandshakeOperation: public StackObj {
>>>>>>>>>>
>>>>>>>>>> Would it be clearer to have HandShakeOperation subclassed by 
>>>>>>>>>> DirectHandShakeOperation, rather than it being a property of 
>>>>>>>>>> the current op instance?
>>>>>>>>> But I would still need to access the is_direct field for 
>>>>>>>>> non-direct handshakes in HandshakeState::try_process() anyways.
>>>>>>>>
>>>>>>>> Okay.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>  41 class HandshakeOperation: public StackObj {
>>>>>>>>>>  42   HandshakeClosure* _handshake_cl;
>>>>>>>>>>  43   int64_t _pending_threads;
>>>>>>>>>>  ...
>>>>>>>>>>  53   void add_target_count(int count) { 
>>>>>>>>>> Atomic::add(&_pending_threads, count); }
>>>>>>>>>>
>>>>>>>>>> Can you clarify the lifecycle of this StackObj please. You 
>>>>>>>>>> obviously expose it to other threads - hence the atomic 
>>>>>>>>>> update - but that means it needs to be guaranteed to be live 
>>>>>>>>>> when that update occurs.
>>>>>>>>> The HandshakeOperation object is created in 
>>>>>>>>> Handshake::execute() (or Handshake::execute_direct() now) by 
>>>>>>>>> the Handshaker. The only other threads that could access this 
>>>>>>>>> object are the VMThread(for non-direct case only) and the 
>>>>>>>>> Handshakee(in both cases). Seeing that it's safe for the 
>>>>>>>>> VMThread to decrement the counter is straightforward since the 
>>>>>>>>> Handshaker will be waiting on the VMOperationRequest_lock 
>>>>>>>>> waiting for the operation to finish (this is just as always 
>>>>>>>>> has been).
>>>>>>>>> As for the Handshakee, we can see that the HandshakeOperation 
>>>>>>>>> object will be alive as long as the VMThread(for the 
>>>>>>>>> non-direct case) or Handshaker(for the direct case) keep 
>>>>>>>>> seeing that the operation is not done by calling 
>>>>>>>>> HandshakeOperation::is_completed(). Once that returns true, 
>>>>>>>>> for the non-direct case the VMThread will finish the operation 
>>>>>>>>> and wake up the Handshaker who will return from 
>>>>>>>>> Handshake::execute() and the object will be destroyed, and for 
>>>>>>>>> the direct case the Handshaker will just return from 
>>>>>>>>> execute_direct() and the object will be destroyed. Since 
>>>>>>>>> is_completed() will only return true when _pending_threads 
>>>>>>>>> reaches zero, that means the decrement operation had to be 
>>>>>>>>> safe. Just as an observation, for the HandshakeAllThreads 
>>>>>>>>> operation, _pending_threads could go to zero and even negative 
>>>>>>>>> before executing add_target_count(), but that's okay because 
>>>>>>>>> the VMThread doesn't call is_completed() before that and the 
>>>>>>>>> load of _pending_threads cannot float above add_target_count() 
>>>>>>>>> since the atomic add operation provides a memory fence.
>>>>>>>>> And finally the Handshakee cannot try to execute an operation 
>>>>>>>>> that has already being processed by the VMThread/Handshaker, 
>>>>>>>>> because they clear it first and then signal the 
>>>>>>>>> _processing_sem semaphore in HandshakeState::try_process(). 
>>>>>>>>> That worked the same before this change.
>>>>>>>>
>>>>>>>> Thanks for the detailed explanation!
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>>
>>>>>>>>>>  349     HandshakeOperation* op = 
>>>>>>>>>> Atomic::load_acquire(&_operation);
>>>>>>>>>>  350     if (op == NULL) {
>>>>>>>>>>  351       op = Atomic::load_acquire(&_operation_direct);
>>>>>>>>>>
>>>>>>>>>> This gives preference to the non-direct op. Is it possible 
>>>>>>>>>> for the direct-op to be indefinitely delayed if there is a 
>>>>>>>>>> series of non-direct ops?
>>>>>>>>> Yes, it's possible, although I think that's an unlikely 
>>>>>>>>> scenario. That would mean the Handshakee would be stuck in 
>>>>>>>>> that while loop with the VMThread constantly setting new 
>>>>>>>>> operations in _operation. But I can modify the loop and make 
>>>>>>>>> it try to execute both before going into the next interation, 
>>>>>>>>> something like (more on load_acquire next):
>>>>>>>>>
>>>>>>>>>      if (has_operation()) {
>>>>>>>>>        HandleMark hm(_thread);
>>>>>>>>>        CautiouslyPreserveExceptionMark pem(_thread);
>>>>>>>>>        HandshakeOperation * op = _operation;
>>>>>>>>>        if (op != NULL) {
>>>>>>>>>          // Disarm before execute the operation
>>>>>>>>>          clear_handshake(false);
>>>>>>>>>          op->do_handshake(_thread);
>>>>>>>>>        }
>>>>>>>>>        op = _operation_direct;
>>>>>>>>>        if (op != NULL) {
>>>>>>>>>          // Disarm before execute the operation
>>>>>>>>>          clear_handshake(true);
>>>>>>>>>          op->do_handshake(_thread);
>>>>>>>>>        }
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> what do you think?
>>>>>>>>
>>>>>>>> That seems better to me.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Also I don't see the release_stores that these load_acquire 
>>>>>>>>>> would pair with. I'm assuming it should be here:
>>>>>>>>>>
>>>>>>>>>>  319     _operation = op;
>>>>>>>>>>  323     _operation_direct = op;
>>>>>>>>>>
>>>>>>>>>> But I would also have expected all necessary memory 
>>>>>>>>>> synchronization to already be present via the semaphore 
>>>>>>>>>> operations and/or the rest of the handshake mechanism. ??
>>>>>>>>> Yes, the release is in 
>>>>>>>>> SafepointMechanism::arm_local_poll_release() after setting 
>>>>>>>>> those fields. But it's true that those load_acquire are not 
>>>>>>>>> needed since the semaphore already has acquire semantics. I 
>>>>>>>>> actually wanted to do a normal load but ended up just copying 
>>>>>>>>> how we were loading _operation. I''ll change them and retest.
>>>>>>>>
>>>>>>>> Okay.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -------
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 396 if ((!is_direct && _operation != NULL) || (is_direct && 
>>>>>>>>>> _operation_direct != NULL)){
>>>>>>>>>> 406   if ((!is_direct && _operation == NULL) || (is_direct && 
>>>>>>>>>> _operation_direct == NULL)){
>>>>>>>>>>
>>>>>>>>>> Can this not be abstracted back into a "has_operation" method 
>>>>>>>>>> that takes a "direct" parameter? e.g
>>>>>>>>>>
>>>>>>>>>>   bool has_operation() const { return _operation != NULL || 
>>>>>>>>>> _operation_direct != NULL; }
>>>>>>>>>> + bool has_specific_operation(bool direct) {
>>>>>>>>>> +   return direct ? _operation_direct != NULL : _operation != 
>>>>>>>>>> NULL;
>>>>>>>>>> + }
>>>>>>>>> Done.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>>>>>>>>>>
>>>>>>>>>> Please add:
>>>>>>>>>>
>>>>>>>>>> import java.util.concurrent.Semaphore;
>>>>>>>>>>
>>>>>>>>>> so you can just refer to Sempahore.
>>>>>>>>> Done.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Style nit: _working_threads
>>>>>>>>>>
>>>>>>>>>> Java style is to not use leading underscore and to use 
>>>>>>>>>> camelCase for variables ie workingThreads.
>>>>>>>>> Right, I changed them to camelCase.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 44 static Thread _suspendresume_thread = new Thread();
>>>>>>>>>>
>>>>>>>>>> The above is dead code.
>>>>>>>>> Removed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 53 if (_is_biased[me] == false) {
>>>>>>>>>>
>>>>>>>>>> Style nit: use "if (!_is_biased[me]) {"
>>>>>>>>> Changed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 80             } catch(InterruptedException ie) {
>>>>>>>>>>   81             }
>>>>>>>>>>
>>>>>>>>>> I suggest inserting "throw new Error("Unexpected 
>>>>>>>>>> interrupt");" for good measure.
>>>>>>>>>>
>>>>>>>>>>  111 _working_threads[i].suspend();
>>>>>>>>> Done.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thread suspension has been deprecated-for-removal as of JDK 
>>>>>>>>>> 14 so could be gone in JDK 15 (but more likely 16). If the 
>>>>>>>>>> suspend/resume is important for this test then you will need 
>>>>>>>>>> to switch to using JVM TI suspend/resume; or else perhaps 
>>>>>>>>>> introduce a WhiteBox method to do whatever you need in the VM.
>>>>>>>>> It's not really needed for this test. I just wanted to mixed 
>>>>>>>>> suspend-resume with handshakes because I know there could be 
>>>>>>>>> some buggy interactions between them. But since it has been 
>>>>>>>>> deprecated then maybe there is no point in stressing 
>>>>>>>>> handshakes with them and I can remove that part.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>  123 // Wait until the desired number of direct handshakes is 
>>>>>>>>>> reached
>>>>>>>>>>  124         while (_handshake_count.get() < 
>>>>>>>>>> DIRECT_HANDSHAKES_MARK) {
>>>>>>>>>>  125             Thread.sleep(10); // sleep for 10ms
>>>>>>>>>>  126         }
>>>>>>>>>>
>>>>>>>>>> You could just do a join() on one of the worker threads.
>>>>>>>>> Right, didn't thought about that. Changed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for looking at this David! I'll retest with the above 
>>>>>>>>> changes in the meantime.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Patricio
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> Patricio
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>



More information about the hotspot-runtime-dev mailing list