RFR: 8244340: Handshake processing thread lacks yielding

Fri May 8 14:14:24 UTC 2020

On 5/8/20 4:38 AM, Robbin Ehn wrote:
> Hi Dan!
>
> On 2020-05-07 22:58, Daniel D. Daugherty wrote:
>>  > v3 full:
>>  > http://cr.openjdk.java.net/~rehn/8244340/v3/full/
>>
>> First, this is way more complicated than I was expecting. I've read all
>> the other code review replies so I _think_ I understand _why_ you went
>> this way, but I don't yet feel like I understand the _how_ of what 
>> you've
>> done here.
>
> We have the previous result which starts like:
> _no_operation = 0
> _not_safe     = 0
> _state_busy   = 0
> _success      = 0
>
> After first iteration typical the new result often looks like:
> _no_operation = 8
> _not_safe     = 1
> _state_busy   = 0
> _success      = 1
>
> So 8 threads don't have any operation, since this is the first pass, we
> know they must have executed it by them self.
> VM thread executed 1 operation.
> But we have one thread that is unsafe and have not completed it's
> operation. So we have changed result here, no need to yield yet.
>
> Next iteration very likely could be:
> _no_operation = 9
> _not_safe     = 1
> _state_busy   = 0
> _success      = 0
>
> If the not safe thread don't perform it's handshake or become safe, this
> result keeps repeating and we do yield.
>
> So we just compare all 4 counters to the previously 4 counters, if they
> do match operation is considered to be stale at the moment and we start
> yielding after some time.
> But as long as the counter are changing we are making progress as they
> cannot go backwards, e.g. _no_operation cannot become _not_safe again.
>
> If we could see all states and assuming all threads executing java code:
> _no_operation = 0
> _not_safe     = 10
> _state_busy   = 0
> _success      = 0
>
> _no_operation = 0
> _not_safe     = 9
> _state_busy   = 1
> _success      = 0
>
> _no_operation = 1
> _not_safe     = 9
> _state_busy   = 0
> _success      = 0
>
> _no_operation = 1
> _not_safe     = 8
> _state_busy   = 1
> _success      = 0
>
> ...
>
> So _success is incremented for threads that become safe but do not
> execute it's handshake, instead vm thread does it for them.
> So they go from _not_safe => _success => _no_operation.

The above helps a lot. Thanks!

So the loop termination condition is when all counters have been
accumulated in _no_operation? That tells you that everyone is done,
correct?

>
>>
>> src/hotspot/share/runtime/handshake.hpp
>>      No comments.
>>
>> src/hotspot/share/runtime/handshake.cpp
>>      L66: class HandshakeSpinYield : public StackObj {
>>          It would be good to have a short paragraph to explain the 
>> high lights
>>          of your handshake spin yield algorithm.
>
> Sure!

I look forward to seeing it. :-)

>
>>
>>      L212:     HandshakeState::ProcessResult pr = 
>> HandshakeState::_no_operation;
>>          nit - extra space after the '='
>
> Fixed.
>
>>
>>      L213:     HandshakeSpinYield cbo(start_time_ns);
>>      L260:     HandshakeSpinYield cbo(start_time_ns);
>>      L351:   HandshakeSpinYield cbo(start_time_ns);
>>          In HotSpot, helper variables are usually named after the 
>> helper class
>>          so 'cbo' would be 'hsy'. Of course, I have to wonder what 
>> 'cbo' stands for?
>
> That is why it's called cbo, CPUBackOFF which was the name of the 
> class during prototyping :)

I somehow knew it couldn't be Congressional Budget Office... :-)

>
> Changed to hsy.
>
>>
>> src/hotspot/share/runtime/thread.hpp
>>      No comments.
>>
>>
>> Okay, I just re-read all the other code review comments for the third 
>> time.
>> I still don't understand your HandshakeSpinYield algorithm. Maybe it 
>> is just
>> me and I'm not seeing something obvious because my head is still 
>> stuck in
>> async monitor deflation land... :-)
>
> Let me know of above help or not?

Definitely helpful.

Dan

>
> Thanks, Robbin
>
>>
>> Dan
>>
>>
>>
>> On 5/7/20 3:03 AM, Robbin Ehn wrote:
>>> Hi Patricio,
>>>
>>> On 2020-05-06 18:24, Patricio Chilano wrote:
>>>>
>>>> On 5/6/20 5:53 AM, Robbin Ehn wrote:
>>>>> Hi Patricio,
>>>>>
>>>>> Here is v2 inc:
>>>>> http://cr.openjdk.java.net/~rehn/8244340/v2/inc/
>>>>> Full:
>>>>> http://cr.openjdk.java.net/~rehn/8244340/v2/full/
>>>> Thanks Robbin, looks good!
>>>
>>> Thanks!
>>>
>>> But I notice that when we do direct handshakes with a JavaThread we
>>> should yield in blocked state. This only effects the corner case when
>>> JavaThread A handshakes B and C handshakes A, C can complete the
>>> handshake faster if A would yield in block state.
>>>
>>> v3 inc:
>>> http://cr.openjdk.java.net/~rehn/8244340/v3/inc/
>>>
>>> v3 full:
>>> http://cr.openjdk.java.net/~rehn/8244340/v3/full/
>>>
>>> Thanks, Robbin
>>>
>>>>
>>>>
>>>> Patricio
>>>>> On 2020-05-05 21:44, Patricio Chilano wrote:
>>>>>> Hi Robbin,
>>>>>>
>>>>>> Thanks for fixing this!
>>>>>
>>>>> Thanks!
>>>>>
>>>>>>
>>>>>> I tested the patch in a Windows Server 2012 virtual machine (not 
>>>>>> VirtualBox), with two virtual cpus assigned. Without the patch, 
>>>>>> running the simple HelloWorld program takes about 2.5seconds most 
>>>>>> of the times when setting “/affinity 1” and around 300ms when 
>>>>>> setting “affinity /FF”. With the patch I get the same timing for 
>>>>>> both cases, about 300ms. It would be great if Miklos can also 
>>>>>> check it against his environment to see if his issue got fixed.
>>>>>
>>>>> Great. They don't seem to be building the JDK them self.
>>>>>
>>>>>>
>>>>>> Only small comments in handshake.cpp:
>>>>>> - In VM_HandshakeOneThread, last log_handshake_info() should be 
>>>>>> (pr == HandshakeState::_success) instead of by_vm_thread. Maybe 
>>>>>> just define "HandshakeState::ProcessResult pr" instead of 
>>>>>> by_vm_thread. Same with Handshake::execute_direct() to get rid of 
>>>>>> by_handshaker.
>>>>>
>>>>> Fixed.
>>>>>
>>>>>> - In Handshake::execute_direct() inside the "if" statement we 
>>>>>> could just break after verifying _success.
>>>>>
>>>>> The if statement is removed.
>>>>>
>>>>>> - I would remove _spin_time_ns and just keep _max_spin_time_ns, 
>>>>>> since both are only set at construction only.
>>>>>
>>>>> Fixed.
>>>>>
>>>>>> - I think a better name for _last_wait_ns should be 
>>>>>> last_spin_start_ns because it is updated also when we see the 
>>>>>> state changed.
>>>>>
>>>>> Fixed.
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>>>
>>>>>> Thanks Robbin!
>>>>>>
>>>>>> Patricio
>>>>>> On 5/5/20 10:08 AM, Robbin Ehn wrote:
>>>>>>> Hi all, please review.
>>>>>>>
>>>>>>> As the issue describes there are problems with this infinitely 
>>>>>>> spinning.
>>>>>>> To avoid any performance regression we do not yield while we do 
>>>>>>> or see
>>>>>>> progress. But when we enter a period of no progression we 
>>>>>>> 'quickly' give
>>>>>>> up.
>>>>>>>
>>>>>>> Issue:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8244340
>>>>>>>
>>>>>>> Code:
>>>>>>> http://cr.openjdk.java.net/~rehn/8244340/v1/webrev/
>>>>>>>
>>>>>>> In the pathologically case we can easily see 10x improvement 
>>>>>>> starting the VM.
>>>>>>> I have found no performance regressions or performance gains in our
>>>>>>> standard benchmark setups.
>>>>>>> But I see much better average completion times for handshakes.
>>>>>>> For example specJVM2008 serial with ZGC shows 4x faster time to
>>>>>>> completion for handshakes in my setup, but little or no effect 
>>>>>>> on score
>>>>>>> (ops/m).
>>>>>>>
>>>>>>> Passes t1-3.
>>>>>>>
>>>>>>> Thanks, Robbin
>>>>>>
>>>>
>>