RFR: 8244340: Handshake processing thread lacks yielding

Daniel D. Daugherty daniel.daugherty at oracle.com
Mon May 11 13:41:45 UTC 2020


Hi Robbin,

On 5/11/20 7:07 AM, Robbin Ehn wrote:
> Hi Dan,
>
> v4 inc:
> http://cr.openjdk.java.net/~rehn/8244340/v4/inc/

src/hotspot/share/runtime/handshake.cpp

     L66: // Performing handshakes require a custom yielding strategy, 
because without it
     L67: // there is a clear performance regression vs plain spinning. 
We keep track of
     L68: // when we last saw progress by looking at why each targeted 
thread have not yet
     L69: // completed it's handshake. After spinning for a while with 
no progress we
     L70: // yield, but as long as there is progress we keep spinning. 
We thus avoid
     L71: // yielding when there is potential work to be done or 
handshake is close to
     L72: // finished.

         Please consider these edits:

           // Performing handshakes requires a custom yielding strategy 
because without it
           // there is a clear performance regression vs plain spinning. 
We keep track of
           // when we last saw progress by looking at why each targeted 
thread has not yet
           // completed its handshake. After spinning for a while with 
no progress we will
           // yield, but as long as there is progress, we keep spinning. 
Thus we avoid
           // yielding when there is potential work to be done or the 
handshake is close
           // to being finished.

> Is above what you had in mine? 

     Yes, with minor edits. Thanks for adding this comment.



> v4 full:
> http://cr.openjdk.java.net/~rehn/8244340/v4/full/
>
> Truncated:
>
> On 2020-05-08 16:14, Daniel D. Daugherty wrote:
>>
>> The above helps a lot. Thanks!
>>
>> So the loop termination condition is when all counters have been
>> accumulated in _no_operation? That tells you that everyone is done,
>> correct?
>
> The handshake operation have an internal counter which is set to the
> number of emitted handshakes. Every time a handshake is performed that
> counter is decremented. When it reaches zero we are done.
> If we would do an extra iteration over all threads then the
> ProcessesResult for all threads would be _no_operation.

Okay thanks. That confirms what I thought would be the end state.


> We could change from do-while loop to a while a, since sometimes that
> counter is zero before the first iteration.
> But it happens when we have few threads, and thus the iteration is
> short, but a tiny speedup in that case (maybe startup).

I'm good with the do-while loop. However, David and/or Ioi might have a
different opinion if you think it might help startup.


>
>>>> src/hotspot/share/runtime/handshake.cpp
>>>>      L66: class HandshakeSpinYield : public StackObj {
>>>>          It would be good to have a short paragraph to explain the 
>>>> high lights
>>>>          of your handshake spin yield algorithm.
>>>
>>> Sure!
>>
>> I look forward to seeing it. :-)
>
> Is above what you had in mine?

Yes. Very nicely done.


>
>>> That is why it's called cbo, CPUBackOFF which was the name of the 
>>> class during prototyping :)
>>
>> I somehow knew it couldn't be Congressional Budget Office... :-)
>
> :)
>
> Thanks, Robbin

Dan

>
>>
>>
>>>
>>> Changed to hsy.
>>>
>>>>
>>>> src/hotspot/share/runtime/thread.hpp
>>>>      No comments.
>>>>
>>>>
>>>> Okay, I just re-read all the other code review comments for the 
>>>> third time.
>>>> I still don't understand your HandshakeSpinYield algorithm. Maybe 
>>>> it is just
>>>> me and I'm not seeing something obvious because my head is still 
>>>> stuck in
>>>> async monitor deflation land... :-)
>>>
>>> Let me know of above help or not?
>>
>> Definitely helpful.
>>
>> Dan
>>
>>
>>>
>>> Thanks, Robbin
>>>
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>> On 5/7/20 3:03 AM, Robbin Ehn wrote:
>>>>> Hi Patricio,
>>>>>
>>>>> On 2020-05-06 18:24, Patricio Chilano wrote:
>>>>>>
>>>>>> On 5/6/20 5:53 AM, Robbin Ehn wrote:
>>>>>>> Hi Patricio,
>>>>>>>
>>>>>>> Here is v2 inc:
>>>>>>> http://cr.openjdk.java.net/~rehn/8244340/v2/inc/
>>>>>>> Full:
>>>>>>> http://cr.openjdk.java.net/~rehn/8244340/v2/full/
>>>>>> Thanks Robbin, looks good!
>>>>>
>>>>> Thanks!
>>>>>
>>>>> But I notice that when we do direct handshakes with a JavaThread we
>>>>> should yield in blocked state. This only effects the corner case when
>>>>> JavaThread A handshakes B and C handshakes A, C can complete the
>>>>> handshake faster if A would yield in block state.
>>>>>
>>>>> v3 inc:
>>>>> http://cr.openjdk.java.net/~rehn/8244340/v3/inc/
>>>>>
>>>>> v3 full:
>>>>> http://cr.openjdk.java.net/~rehn/8244340/v3/full/
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>>>
>>>>>>
>>>>>> Patricio
>>>>>>> On 2020-05-05 21:44, Patricio Chilano wrote:
>>>>>>>> Hi Robbin,
>>>>>>>>
>>>>>>>> Thanks for fixing this!
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>>
>>>>>>>> I tested the patch in a Windows Server 2012 virtual machine 
>>>>>>>> (not VirtualBox), with two virtual cpus assigned. Without the 
>>>>>>>> patch, running the simple HelloWorld program takes about 
>>>>>>>> 2.5seconds most of the times when setting “/affinity 1” and 
>>>>>>>> around 300ms when setting “affinity /FF”. With the patch I get 
>>>>>>>> the same timing for both cases, about 300ms. It would be great 
>>>>>>>> if Miklos can also check it against his environment to see if 
>>>>>>>> his issue got fixed.
>>>>>>>
>>>>>>> Great. They don't seem to be building the JDK them self.
>>>>>>>
>>>>>>>>
>>>>>>>> Only small comments in handshake.cpp:
>>>>>>>> - In VM_HandshakeOneThread, last log_handshake_info() should be 
>>>>>>>> (pr == HandshakeState::_success) instead of by_vm_thread. Maybe 
>>>>>>>> just define "HandshakeState::ProcessResult pr" instead of 
>>>>>>>> by_vm_thread. Same with Handshake::execute_direct() to get rid 
>>>>>>>> of by_handshaker.
>>>>>>>
>>>>>>> Fixed.
>>>>>>>
>>>>>>>> - In Handshake::execute_direct() inside the "if" statement we 
>>>>>>>> could just break after verifying _success.
>>>>>>>
>>>>>>> The if statement is removed.
>>>>>>>
>>>>>>>> - I would remove _spin_time_ns and just keep _max_spin_time_ns, 
>>>>>>>> since both are only set at construction only.
>>>>>>>
>>>>>>> Fixed.
>>>>>>>
>>>>>>>> - I think a better name for _last_wait_ns should be 
>>>>>>>> last_spin_start_ns because it is updated also when we see the 
>>>>>>>> state changed.
>>>>>>>
>>>>>>> Fixed.
>>>>>>>
>>>>>>> Thanks, Robbin
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks Robbin!
>>>>>>>>
>>>>>>>> Patricio
>>>>>>>> On 5/5/20 10:08 AM, Robbin Ehn wrote:
>>>>>>>>> Hi all, please review.
>>>>>>>>>
>>>>>>>>> As the issue describes there are problems with this infinitely 
>>>>>>>>> spinning.
>>>>>>>>> To avoid any performance regression we do not yield while we 
>>>>>>>>> do or see
>>>>>>>>> progress. But when we enter a period of no progression we 
>>>>>>>>> 'quickly' give
>>>>>>>>> up.
>>>>>>>>>
>>>>>>>>> Issue:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8244340
>>>>>>>>>
>>>>>>>>> Code:
>>>>>>>>> http://cr.openjdk.java.net/~rehn/8244340/v1/webrev/
>>>>>>>>>
>>>>>>>>> In the pathologically case we can easily see 10x improvement 
>>>>>>>>> starting the VM.
>>>>>>>>> I have found no performance regressions or performance gains 
>>>>>>>>> in our
>>>>>>>>> standard benchmark setups.
>>>>>>>>> But I see much better average completion times for handshakes.
>>>>>>>>> For example specJVM2008 serial with ZGC shows 4x faster time to
>>>>>>>>> completion for handshakes in my setup, but little or no effect 
>>>>>>>>> on score
>>>>>>>>> (ops/m).
>>>>>>>>>
>>>>>>>>> Passes t1-3.
>>>>>>>>>
>>>>>>>>> Thanks, Robbin
>>>>>>>>
>>>>>>
>>>>
>>



More information about the hotspot-runtime-dev mailing list