RFR 8230594: Allow direct handshakes without VMThread intervention
Patricio Chilano
patricio.chilano.mateo at oracle.com
Wed Jan 15 20:06:06 UTC 2020
Hi David,
Here is v2:
Full: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/webrev/
Inc: http://cr.openjdk.java.net/~pchilanomate/8230594/v02/inc/webrev/
Passed one round of t1-6, and running more rounds now.
Thanks,
Patricio
On 1/14/20 7:57 PM, David Holmes wrote:
> Hi Patricio,
>
> On 15/01/2020 5:17 am, Patricio Chilano wrote:
>> Hi David,
>>
>> On 1/14/20 4:13 AM, David Holmes wrote:
>>> Hi Patricio,
>>>
>>> Thanks for tackling this. I've taken an initial look through trying
>>> to understand the general approach, but I don't claim to understand
>>> all the subtle details, so there are some queries below to help with
>>> my overall understanding.
>>>
>>> On 14/01/2020 2:25 am, Patricio Chilano wrote:
>>>> Hi all,
>>>>
>>>> The following patch adds the ability to execute direct handshakes
>>>> between JavaThreads without the VMThread intervention, and enables
>>>> this functionality for biased locking revocations.
>>>> The current handshake mechanism that uses the VMThread, either to
>>>> handshake one JavaThread or all of them, is still the default
>>>> unless you specify otherwise when calling Handshake::execute().
>>>
>>> Can I suggest that rather than using an overloaded:
>>>
>>> static bool execute(HandshakeClosure* hs_cl, JavaThread* target,
>>> bool is_direct_handshake = false);
>>>
>>> you instead add an execute_direct method for clarity?
>> Ok, added execute_direct() and removed overloading.
>>
>>>> In order to avoid adding additional overhead to this path that uses
>>>> the VMThread (especially the one that handshakes all JavaThreads) I
>>>> added a new HandshakeOperation pointer in the HandshakeState class,
>>>> _operation_direct, to be used for the direct handshake cases only
>>>> and whose access is serialized between JavaThreads by using a
>>>> semaphore. Thus, one direct handshake will be allowed at any given
>>>> time, and upon completion the semaphore will be signaled to allow
>>>> the next handshaker if any to proceed. In this way the old
>>>> _operation can still be used only by the VMThread without the need
>>>> for synchronization to access it. The handshakee will now check if
>>>> any of _operation or _operation_direct is set when checking for a
>>>> pending handshake and will try to execute both in
>>>> HandshakeState::process_self_inner(). The execution of the
>>>> handshake’s ThreadClosure, either direct handshake or not, is still
>>>> protected by a semaphore, which I renamed to _processing_sem.
>>>> I converted the semaphore _done in HandshakeOperation to be just an
>>>> atomic counter because of bug
>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674 (which I
>>>> actually hit once!). Since the semaphore could not be static
>>>> anymore due to possibly having more than one HandshakeOperation at
>>>> a time, the handshakee could try to access the nwaiters field of an
>>>> already destroyed semaphore when signaling it. In any case nobody
>>>> was waiting on that semaphore (we were not using kernel
>>>> functionality), so just using an atomic counter seems more
>>>> appropriate.
>>>> In order to avoid issues due to disarming a JavaThread that should
>>>> still be armed for a handshake or safepoint, each JavaThread will
>>>> now always disarm its own polling page.
>>>> I also added a new test, HandshakeDirectTest.java, which tries to
>>>> stress the use of direct handshakes with revocations.
>>>> In terms of performance, I measured no difference in the execution
>>>> time of one individual handshake. The difference can be seen when
>>>> several handshakes at a time are executed as expected. So for
>>>> example on Linux running on an Intel Xeon 8167M cpu, test
>>>> HandshakeDirectTest.java (which executes 50000 handshakes between
>>>> 32 threads) executes in around 340ms using direct handshakes and in
>>>> around 5.6 seconds without it. For a modified version of that test
>>>> that only executes 128 handshakes between the 32 threads and avoids
>>>> any suspend-resume, the test takes around 12ms with direct
>>>> handshakes and 19ms without it.
>>>> Tested with mach5, tiers1-6.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230594
>>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8230594/v01/webrev/
>>>
>>> src/hotspot/share/runtime/handshake.hpp
>>>
>>> 36 // A handshake closure is a callback that is executed for each
>>> JavaThread
>>> 37 // while that thread is in a safepoint safe state. The callback
>>> is executed
>>> 38 // either by the thread itself or by the VM thread while
>>> keeping the thread
>>> 39 // in a blocked state. A handshake can be performed with a single
>>> 40 // JavaThread as well.
>>>
>>> Does this comment block need updating for the direct case?
>>>
>>> 61 // the operation is only done by either VM thread on behalf of
>>> the JavaThread
>>> 62 // or the JavaThread itself.
>>>
>>> Again does this need an update?
>> Yes, I forgot to update those comments in handshake.hpp. Fixed.
>>
>>
>>> src/hotspot/share/runtime/handshake.hpp
>>>
>>> 41 class HandshakeOperation: public StackObj {
>>>
>>> Would it be clearer to have HandShakeOperation subclassed by
>>> DirectHandShakeOperation, rather than it being a property of the
>>> current op instance?
>> But I would still need to access the is_direct field for non-direct
>> handshakes in HandshakeState::try_process() anyways.
>
> Okay.
>
>>
>>> 41 class HandshakeOperation: public StackObj {
>>> 42 HandshakeClosure* _handshake_cl;
>>> 43 int64_t _pending_threads;
>>> ...
>>> 53 void add_target_count(int count) {
>>> Atomic::add(&_pending_threads, count); }
>>>
>>> Can you clarify the lifecycle of this StackObj please. You obviously
>>> expose it to other threads - hence the atomic update - but that
>>> means it needs to be guaranteed to be live when that update occurs.
>> The HandshakeOperation object is created in Handshake::execute() (or
>> Handshake::execute_direct() now) by the Handshaker. The only other
>> threads that could access this object are the VMThread(for non-direct
>> case only) and the Handshakee(in both cases). Seeing that it's safe
>> for the VMThread to decrement the counter is straightforward since
>> the Handshaker will be waiting on the VMOperationRequest_lock waiting
>> for the operation to finish (this is just as always has been).
>> As for the Handshakee, we can see that the HandshakeOperation object
>> will be alive as long as the VMThread(for the non-direct case) or
>> Handshaker(for the direct case) keep seeing that the operation is not
>> done by calling HandshakeOperation::is_completed(). Once that returns
>> true, for the non-direct case the VMThread will finish the operation
>> and wake up the Handshaker who will return from Handshake::execute()
>> and the object will be destroyed, and for the direct case the
>> Handshaker will just return from execute_direct() and the object will
>> be destroyed. Since is_completed() will only return true when
>> _pending_threads reaches zero, that means the decrement operation had
>> to be safe. Just as an observation, for the HandshakeAllThreads
>> operation, _pending_threads could go to zero and even negative before
>> executing add_target_count(), but that's okay because the VMThread
>> doesn't call is_completed() before that and the load of
>> _pending_threads cannot float above add_target_count() since the
>> atomic add operation provides a memory fence.
>> And finally the Handshakee cannot try to execute an operation that
>> has already being processed by the VMThread/Handshaker, because they
>> clear it first and then signal the _processing_sem semaphore in
>> HandshakeState::try_process(). That worked the same before this change.
>
> Thanks for the detailed explanation!
>
>>
>>> --
>>>
>>> 349 HandshakeOperation* op = Atomic::load_acquire(&_operation);
>>> 350 if (op == NULL) {
>>> 351 op = Atomic::load_acquire(&_operation_direct);
>>>
>>> This gives preference to the non-direct op. Is it possible for the
>>> direct-op to be indefinitely delayed if there is a series of
>>> non-direct ops?
>> Yes, it's possible, although I think that's an unlikely scenario.
>> That would mean the Handshakee would be stuck in that while loop with
>> the VMThread constantly setting new operations in _operation. But I
>> can modify the loop and make it try to execute both before going into
>> the next interation, something like (more on load_acquire next):
>>
>> if (has_operation()) {
>> HandleMark hm(_thread);
>> CautiouslyPreserveExceptionMark pem(_thread);
>> HandshakeOperation * op = _operation;
>> if (op != NULL) {
>> // Disarm before execute the operation
>> clear_handshake(false);
>> op->do_handshake(_thread);
>> }
>> op = _operation_direct;
>> if (op != NULL) {
>> // Disarm before execute the operation
>> clear_handshake(true);
>> op->do_handshake(_thread);
>> }
>> }
>>
>> what do you think?
>
> That seems better to me.
>
>>
>>> Also I don't see the release_stores that these load_acquire would
>>> pair with. I'm assuming it should be here:
>>>
>>> 319 _operation = op;
>>> 323 _operation_direct = op;
>>>
>>> But I would also have expected all necessary memory synchronization
>>> to already be present via the semaphore operations and/or the rest
>>> of the handshake mechanism. ??
>> Yes, the release is in SafepointMechanism::arm_local_poll_release()
>> after setting those fields. But it's true that those load_acquire are
>> not needed since the semaphore already has acquire semantics. I
>> actually wanted to do a normal load but ended up just copying how we
>> were loading _operation. I''ll change them and retest.
>
> Okay.
>
> Thanks,
> David
> -------
>
>>
>>> 396 if ((!is_direct && _operation != NULL) || (is_direct &&
>>> _operation_direct != NULL)){
>>> 406 if ((!is_direct && _operation == NULL) || (is_direct &&
>>> _operation_direct == NULL)){
>>>
>>> Can this not be abstracted back into a "has_operation" method that
>>> takes a "direct" parameter? e.g
>>>
>>> bool has_operation() const { return _operation != NULL ||
>>> _operation_direct != NULL; }
>>> + bool has_specific_operation(bool direct) {
>>> + return direct ? _operation_direct != NULL : _operation != NULL;
>>> + }
>> Done.
>>
>>
>>> ---
>>>
>>> test/hotspot/jtreg/runtime/handshake/HandshakeDirectTest.java
>>>
>>> Please add:
>>>
>>> import java.util.concurrent.Semaphore;
>>>
>>> so you can just refer to Sempahore.
>> Done.
>>
>>
>>> Style nit: _working_threads
>>>
>>> Java style is to not use leading underscore and to use camelCase for
>>> variables ie workingThreads.
>> Right, I changed them to camelCase.
>>
>>
>>> 44 static Thread _suspendresume_thread = new Thread();
>>>
>>> The above is dead code.
>> Removed.
>>
>>
>>> 53 if (_is_biased[me] == false) {
>>>
>>> Style nit: use "if (!_is_biased[me]) {"
>> Changed.
>>
>>
>>> 80 } catch(InterruptedException ie) {
>>> 81 }
>>>
>>> I suggest inserting "throw new Error("Unexpected interrupt");" for
>>> good measure.
>>>
>>> 111 _working_threads[i].suspend();
>> Done.
>>
>>
>>> Thread suspension has been deprecated-for-removal as of JDK 14 so
>>> could be gone in JDK 15 (but more likely 16). If the suspend/resume
>>> is important for this test then you will need to switch to using JVM
>>> TI suspend/resume; or else perhaps introduce a WhiteBox method to do
>>> whatever you need in the VM.
>> It's not really needed for this test. I just wanted to mixed
>> suspend-resume with handshakes because I know there could be some
>> buggy interactions between them. But since it has been deprecated
>> then maybe there is no point in stressing handshakes with them and I
>> can remove that part.
>>
>>
>>> 123 // Wait until the desired number of direct handshakes is reached
>>> 124 while (_handshake_count.get() < DIRECT_HANDSHAKES_MARK) {
>>> 125 Thread.sleep(10); // sleep for 10ms
>>> 126 }
>>>
>>> You could just do a join() on one of the worker threads.
>> Right, didn't thought about that. Changed.
>>
>>
>> Thanks for looking at this David! I'll retest with the above changes
>> in the meantime.
>>
>>
>> Patricio
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Thanks!
>>>> Patricio
>>>>
>>
More information about the hotspot-runtime-dev
mailing list