RFR 8230594: Allow direct handshakes without VMThread intervention

Tue Jan 14 22:57:45 UTC 2020

Hi Patricio,

On 15/01/2020 5:17 am, Patricio Chilano wrote:
> Hi David,
> 
> On 1/14/20 4:13 AM, David Holmes wrote:
>> Hi Patricio,
>>
>> Thanks for tackling this. I've taken an initial look through trying to 
>> understand the general approach, but I don't claim to understand all 
>> the subtle details, so there are some queries below to help with my 
>> overall understanding.
>>
>> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>>> Hi all,
>>>
>>> The following patch adds the ability to execute direct handshakes 
>>> between JavaThreads without the VMThread intervention, and enables 
>>> this functionality for biased locking revocations.
>>> The current handshake mechanism that uses the VMThread, either to 
>>> handshake one JavaThread or all of them, is still the default unless 
>>> you specify otherwise when calling Handshake::execute().
>>
>> Can I suggest that rather than using an overloaded:
>>
>>  static bool execute(HandshakeClosure* hs_cl, JavaThread* target, bool 
>> is_direct_handshake = false);
>>
>> you instead add an execute_direct method for clarity?
> Ok, added execute_direct() and removed overloading.
> 
>>> In order to avoid adding additional overhead to this path that uses 
>>> the VMThread (especially the one that handshakes all JavaThreads) I 
>>> added a new HandshakeOperation pointer in the HandshakeState class, 
>>> _operation_direct, to be used for the direct handshake cases only and 
>>> whose access is serialized between JavaThreads by using a semaphore. 
>>> Thus, one direct handshake will be allowed at any given time, and 
>>> upon completion the semaphore will be signaled to allow the next 
>>> handshaker if any to proceed. In this way the old _operation can 
>>> still be used only by the VMThread without the need for 
>>> synchronization to access it. The handshakee will now check if any of 
>>> _operation or _operation_direct is set when checking for a pending 
>>> handshake and will try to execute both in 
>>> HandshakeState::process_self_inner(). The execution of the 
>>> handshake’s ThreadClosure, either direct handshake or not, is still 
>>> protected by a semaphore, which I renamed to _processing_sem.
>>> I converted the semaphore _done in HandshakeOperation to be just an 
>>> atomic counter because of bug 
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which I 
>>> actually hit once!). Since the semaphore could not be static anymore 
>>> due to possibly having more than one HandshakeOperation at a time, 
>>> the handshakee could try to access the nwaiters field of an already 
>>> destroyed semaphore when signaling it. In any case nobody was waiting 
>>> on that semaphore (we were not using kernel functionality), so just 
>>> using an atomic counter seems more appropriate.
>>> In order to avoid issues due to disarming a JavaThread that should 
>>> still be armed for a handshake or safepoint, each JavaThread will now 
>>> always disarm its own polling page.
>>> I also added a new test, HandshakeDirectTest.java, which tries to 
>>> stress the use of direct handshakes with revocations.
>>> In terms of performance, I measured no difference in the execution 
>>> time of one individual handshake. The difference can be seen when 
>>> several handshakes at a time are executed as expected. So for example 
>>> on Linux running on an Intel Xeon 8167M cpu, test 
>>> HandshakeDirectTest.java (which executes 50000 handshakes between 32 
>>> threads) executes in around 340ms using direct handshakes and in 
>>> around 5.6 seconds without it. For a modified version of that test 
>>> that only executes 128 handshakes between the 32 threads and avoids 
>>> any suspend-resume, the test takes around 12ms with direct handshakes 
>>> and 19ms without it.
>>> Tested with mach5, tiers1-6.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>>
>> src/hotspot/share/runtime/handshake.hpp
>>
>>  36 // A handshake closure is a callback that is executed for each 
>> JavaThread
>>   37 // while that thread is in a safepoint safe state. The callback 
>> is executed
>>   38 // either by the thread itself or by the VM thread while keeping 
>> the thread
>>   39 // in a blocked state. A handshake can be performed with a single
>>   40 // JavaThread as well.
>>
>> Does this comment block need updating for the direct case?
>>
>>  61 // the operation is only done by either VM thread on behalf of the 
>> JavaThread
>>  62 // or the JavaThread itself.
>>
>> Again does this need an update?
> Yes, I forgot to update those comments in handshake.hpp. Fixed.
> 
> 
>> src/hotspot/share/runtime/handshake.hpp
>>
>>  41 class HandshakeOperation: public StackObj {
>>
>> Would it be clearer to have HandShakeOperation subclassed by 
>> DirectHandShakeOperation, rather than it being a property of the 
>> current op instance?
> But I would still need to access the is_direct field for non-direct 
> handshakes in HandshakeState::try_process() anyways.

Okay.

> 
>>  41 class HandshakeOperation: public StackObj {
>>  42   HandshakeClosure* _handshake_cl;
>>  43   int64_t _pending_threads;
>>  ...
>>  53   void add_target_count(int count) { 
>> Atomic::add(&_pending_threads, count); }
>>
>> Can you clarify the lifecycle of this StackObj please. You obviously 
>> expose it to other threads - hence the atomic update - but that means 
>> it needs to be guaranteed to be live when that update occurs.
> The HandshakeOperation object is created in Handshake::execute() (or 
> Handshake::execute_direct() now) by the Handshaker. The only other 
> threads that could access this object are the VMThread(for non-direct 
> case only) and the Handshakee(in both cases). Seeing that it's safe for 
> the VMThread to decrement the counter is straightforward since the 
> Handshaker will be waiting on the VMOperationRequest_lock waiting for 
> the operation to finish (this is just as always has been).
> As for the Handshakee, we can see that the HandshakeOperation object 
> will be alive as long as the VMThread(for the non-direct case) or 
> Handshaker(for the direct case) keep seeing that the operation is not 
> done by calling HandshakeOperation::is_completed(). Once that returns 
> true, for the non-direct case the VMThread will finish the operation and 
> wake up the Handshaker who will return from Handshake::execute() and the 
> object will be destroyed, and for the direct case the Handshaker will 
> just return from execute_direct() and the object will be destroyed. 
> Since is_completed() will only return true when _pending_threads reaches 
> zero, that means the decrement operation had to be safe. Just as an 
> observation, for the HandshakeAllThreads operation, _pending_threads 
> could go to zero and even negative before executing add_target_count(), 
> but that's okay because the VMThread doesn't call is_completed() before 
> that and the load of _pending_threads cannot float above 
> add_target_count() since the atomic add operation provides a memory fence.
> And finally the Handshakee cannot try to execute an operation that has 
> already being processed by the VMThread/Handshaker, because they clear 
> it first and then signal the _processing_sem semaphore in 
> HandshakeState::try_process(). That worked the same before this change.

Thanks for the detailed explanation!

> 
>> -- 
>>
>>  349     HandshakeOperation* op = Atomic::load_acquire(&_operation);
>>  350     if (op == NULL) {
>>  351       op = Atomic::load_acquire(&_operation_direct);
>>
>> This gives preference to the non-direct op. Is it possible for the 
>> direct-op to be indefinitely delayed if there is a series of 
>> non-direct ops?
> Yes, it's possible, although I think that's an unlikely scenario. That 
> would mean the Handshakee would be stuck in that while loop with the 
> VMThread constantly setting new operations in _operation. But I can 
> modify the loop and make it try to execute both before going into the 
> next interation, something like (more on load_acquire next):
> 
>      if (has_operation()) {
>        HandleMark hm(_thread);
>        CautiouslyPreserveExceptionMark pem(_thread);
>        HandshakeOperation * op = _operation;
>        if (op != NULL) {
>          // Disarm before execute the operation
>          clear_handshake(false);
>          op->do_handshake(_thread);
>        }
>        op = _operation_direct;
>        if (op != NULL) {
>          // Disarm before execute the operation
>          clear_handshake(true);
>          op->do_handshake(_thread);
>        }
>      }
> 
> what do you think?

That seems better to me.

> 
>> Also I don't see the release_stores that these load_acquire would pair 
>> with. I'm assuming it should be here:
>>
>>  319     _operation = op;
>>  323     _operation_direct = op;
>>
>> But I would also have expected all necessary memory synchronization to 
>> already be present via the semaphore operations and/or the rest of the 
>> handshake mechanism. ??
> Yes, the release is in SafepointMechanism::arm_local_poll_release() 
> after setting those fields. But it's true that those load_acquire are 
> not needed since the semaphore already has acquire semantics. I actually 
> wanted to do a normal load but ended up just copying how we were loading 
> _operation. I''ll change them and retest.

Okay.

Thanks,
David
-------

> 
>> 396 if ((!is_direct && _operation != NULL) || (is_direct && 
>> _operation_direct != NULL)){
>> 406   if ((!is_direct && _operation == NULL) || (is_direct && 
>> _operation_direct == NULL)){
>>
>> Can this not be abstracted back into a "has_operation" method that 
>> takes a "direct" parameter? e.g
>>
>>   bool has_operation() const { return _operation != NULL || 
>> _operation_direct != NULL; }
>> + bool has_specific_operation(bool direct) {
>> +   return direct ? _operation_direct != NULL : _operation != NULL;
>> + }
> Done.
> 
> 
>> ---
>>
>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>>
>> Please add:
>>
>> import java.util.concurrent.Semaphore;
>>
>> so you can just refer to Sempahore.
> Done.
> 
> 
>> Style nit: _working_threads
>>
>> Java style is to not use leading underscore and to use camelCase for 
>> variables ie workingThreads.
> Right, I changed them to camelCase.
> 
> 
>> 44 static Thread _suspendresume_thread = new Thread();
>>
>> The above is dead code.
> Removed.
> 
> 
>> 53 if (_is_biased[me] == false) {
>>
>> Style nit: use "if (!_is_biased[me]) {"
> Changed.
> 
> 
>> 80             } catch(InterruptedException ie) {
>>   81             }
>>
>> I suggest inserting "throw new Error("Unexpected interrupt");" for 
>> good measure.
>>
>>  111                     _working_threads[i].suspend();
> Done.
> 
> 
>> Thread suspension has been deprecated-for-removal as of JDK 14 so 
>> could be gone in JDK 15 (but more likely 16). If the suspend/resume is 
>> important for this test then you will need to switch to using JVM TI 
>> suspend/resume; or else perhaps introduce a WhiteBox method to do 
>> whatever you need in the VM.
> It's not really needed for this test. I just wanted to mixed 
> suspend-resume with handshakes because I know there could be some buggy 
> interactions between them. But since it has been deprecated then maybe 
> there is no point in stressing handshakes with them and I can remove 
> that part.
> 
> 
>>  123 // Wait until the desired number of direct handshakes is reached
>>  124         while (_handshake_count.get() < DIRECT_HANDSHAKES_MARK) {
>>  125             Thread.sleep(10); // sleep for 10ms
>>  126         }
>>
>> You could just do a join() on one of the worker threads.
> Right, didn't thought about that. Changed.
> 
> 
> Thanks for looking at this David! I'll retest with the above changes in 
> the meantime.
> 
> 
> Patricio
>> Thanks,
>> David
>> -----
>>
>>> Thanks!
>>> Patricio
>>>
>