RFR: 8244340: Handshake processing thread lacks yielding

Fri May 8 08:38:28 UTC 2020

Hi Dan!

On 2020-05-07 22:58, Daniel D. Daugherty wrote:
>  > v3 full:
>  > http://cr.openjdk.java.net/~rehn/8244340/v3/full/
> 
> First, this is way more complicated than I was expecting. I've read all
> the other code review replies so I _think_ I understand _why_ you went
> this way, but I don't yet feel like I understand the _how_ of what you've
> done here.

We have the previous result which starts like:
_no_operation = 0
_not_safe     = 0
_state_busy   = 0
_success      = 0

After first iteration typical the new result often looks like:
_no_operation = 8
_not_safe     = 1
_state_busy   = 0
_success      = 1

So 8 threads don't have any operation, since this is the first pass, we
know they must have executed it by them self.
VM thread executed 1 operation.
But we have one thread that is unsafe and have not completed it's
operation. So we have changed result here, no need to yield yet.

Next iteration very likely could be:
_no_operation = 9
_not_safe     = 1
_state_busy   = 0
_success      = 0

If the not safe thread don't perform it's handshake or become safe, this
result keeps repeating and we do yield.

So we just compare all 4 counters to the previously 4 counters, if they
do match operation is considered to be stale at the moment and we start
yielding after some time.
But as long as the counter are changing we are making progress as they
cannot go backwards, e.g. _no_operation cannot become _not_safe again.

If we could see all states and assuming all threads executing java code:
_no_operation = 0
_not_safe     = 10
_state_busy   = 0
_success      = 0

_no_operation = 0
_not_safe     = 9
_state_busy   = 1
_success      = 0

_no_operation = 1
_not_safe     = 9
_state_busy   = 0
_success      = 0

_no_operation = 1
_not_safe     = 8
_state_busy   = 1
_success      = 0

...

So _success is incremented for threads that become safe but do not
execute it's handshake, instead vm thread does it for them.
So they go from _not_safe => _success => _no_operation.

> 
> src/hotspot/share/runtime/handshake.hpp
>      No comments.
> 
> src/hotspot/share/runtime/handshake.cpp
>      L66: class HandshakeSpinYield : public StackObj {
>          It would be good to have a short paragraph to explain the high 
> lights
>          of your handshake spin yield algorithm.

Sure!

> 
>      L212:     HandshakeState::ProcessResult pr = 
> HandshakeState::_no_operation;
>          nit - extra space after the '='

Fixed.

> 
>      L213:     HandshakeSpinYield cbo(start_time_ns);
>      L260:     HandshakeSpinYield cbo(start_time_ns);
>      L351:   HandshakeSpinYield cbo(start_time_ns);
>          In HotSpot, helper variables are usually named after the helper 
> class
>          so 'cbo' would be 'hsy'. Of course, I have to wonder what 'cbo' 
> stands for?

That is why it's called cbo, CPUBackOFF which was the name of the class 
during prototyping :)

Changed to hsy.

> 
> src/hotspot/share/runtime/thread.hpp
>      No comments.
> 
> 
> Okay, I just re-read all the other code review comments for the third time.
> I still don't understand your HandshakeSpinYield algorithm. Maybe it is 
> just
> me and I'm not seeing something obvious because my head is still stuck in
> async monitor deflation land... :-)

Let me know of above help or not?

Thanks, Robbin

> 
> Dan
> 
> 
> 
> On 5/7/20 3:03 AM, Robbin Ehn wrote:
>> Hi Patricio,
>>
>> On 2020-05-06 18:24, Patricio Chilano wrote:
>>>
>>> On 5/6/20 5:53 AM, Robbin Ehn wrote:
>>>> Hi Patricio,
>>>>
>>>> Here is v2 inc:
>>>> http://cr.openjdk.java.net/~rehn/8244340/v2/inc/
>>>> Full:
>>>> http://cr.openjdk.java.net/~rehn/8244340/v2/full/
>>> Thanks Robbin, looks good!
>>
>> Thanks!
>>
>> But I notice that when we do direct handshakes with a JavaThread we
>> should yield in blocked state. This only effects the corner case when
>> JavaThread A handshakes B and C handshakes A, C can complete the
>> handshake faster if A would yield in block state.
>>
>> v3 inc:
>> http://cr.openjdk.java.net/~rehn/8244340/v3/inc/
>>
>> v3 full:
>> http://cr.openjdk.java.net/~rehn/8244340/v3/full/
>>
>> Thanks, Robbin
>>
>>>
>>>
>>> Patricio
>>>> On 2020-05-05 21:44, Patricio Chilano wrote:
>>>>> Hi Robbin,
>>>>>
>>>>> Thanks for fixing this!
>>>>
>>>> Thanks!
>>>>
>>>>>
>>>>> I tested the patch in a Windows Server 2012 virtual machine (not 
>>>>> VirtualBox), with two virtual cpus assigned. Without the patch, 
>>>>> running the simple HelloWorld program takes about 2.5seconds most 
>>>>> of the times when setting “/affinity 1” and around 300ms when 
>>>>> setting “affinity /FF”. With the patch I get the same timing for 
>>>>> both cases, about 300ms. It would be great if Miklos can also check 
>>>>> it against his environment to see if his issue got fixed.
>>>>
>>>> Great. They don't seem to be building the JDK them self.
>>>>
>>>>>
>>>>> Only small comments in handshake.cpp:
>>>>> - In VM_HandshakeOneThread, last log_handshake_info() should be (pr 
>>>>> == HandshakeState::_success) instead of by_vm_thread. Maybe just 
>>>>> define "HandshakeState::ProcessResult pr" instead of by_vm_thread. 
>>>>> Same with Handshake::execute_direct() to get rid of by_handshaker.
>>>>
>>>> Fixed.
>>>>
>>>>> - In Handshake::execute_direct() inside the "if" statement we could 
>>>>> just break after verifying _success.
>>>>
>>>> The if statement is removed.
>>>>
>>>>> - I would remove _spin_time_ns and just keep _max_spin_time_ns, 
>>>>> since both are only set at construction only.
>>>>
>>>> Fixed.
>>>>
>>>>> - I think a better name for _last_wait_ns should be 
>>>>> last_spin_start_ns because it is updated also when we see the state 
>>>>> changed.
>>>>
>>>> Fixed.
>>>>
>>>> Thanks, Robbin
>>>>
>>>>>
>>>>> Thanks Robbin!
>>>>>
>>>>> Patricio
>>>>> On 5/5/20 10:08 AM, Robbin Ehn wrote:
>>>>>> Hi all, please review.
>>>>>>
>>>>>> As the issue describes there are problems with this infinitely 
>>>>>> spinning.
>>>>>> To avoid any performance regression we do not yield while we do or 
>>>>>> see
>>>>>> progress. But when we enter a period of no progression we 
>>>>>> 'quickly' give
>>>>>> up.
>>>>>>
>>>>>> Issue:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8244340
>>>>>>
>>>>>> Code:
>>>>>> http://cr.openjdk.java.net/~rehn/8244340/v1/webrev/
>>>>>>
>>>>>> In the pathologically case we can easily see 10x improvement 
>>>>>> starting the VM.
>>>>>> I have found no performance regressions or performance gains in our
>>>>>> standard benchmark setups.
>>>>>> But I see much better average completion times for handshakes.
>>>>>> For example specJVM2008 serial with ZGC shows 4x faster time to
>>>>>> completion for handshakes in my setup, but little or no effect on 
>>>>>> score
>>>>>> (ops/m).
>>>>>>
>>>>>> Passes t1-3.
>>>>>>
>>>>>> Thanks, Robbin
>>>>>
>>>
>