RFR: 8244340: Handshake processing thread lacks yielding

Mon May 11 14:31:45 UTC 2020

Hi Dan,

Fixed comment, thanks!

/Robbin

On 2020-05-11 15:41, Daniel D. Daugherty wrote:
> Hi Robbin,
> 
> On 5/11/20 7:07 AM, Robbin Ehn wrote:
>> Hi Dan,
>>
>> v4 inc:
>> http://cr.openjdk.java.net/~rehn/8244340/v4/inc/
> 
> src/hotspot/share/runtime/handshake.cpp
> 
>      L66: // Performing handshakes require a custom yielding strategy, 
> because without it
>      L67: // there is a clear performance regression vs plain spinning. 
> We keep track of
>      L68: // when we last saw progress by looking at why each targeted 
> thread have not yet
>      L69: // completed it's handshake. After spinning for a while with 
> no progress we
>      L70: // yield, but as long as there is progress we keep spinning. 
> We thus avoid
>      L71: // yielding when there is potential work to be done or 
> handshake is close to
>      L72: // finished.
> 
>          Please consider these edits:
> 
>            // Performing handshakes requires a custom yielding strategy 
> because without it
>            // there is a clear performance regression vs plain spinning. 
> We keep track of
>            // when we last saw progress by looking at why each targeted 
> thread has not yet
>            // completed its handshake. After spinning for a while with 
> no progress we will
>            // yield, but as long as there is progress, we keep spinning. 
> Thus we avoid
>            // yielding when there is potential work to be done or the 
> handshake is close
>            // to being finished.
> 
>> Is above what you had in mine? 
> 
>      Yes, with minor edits. Thanks for adding this comment.
> 
> 
> 
>> v4 full:
>> http://cr.openjdk.java.net/~rehn/8244340/v4/full/
>>
>> Truncated:
>>
>> On 2020-05-08 16:14, Daniel D. Daugherty wrote:
>>>
>>> The above helps a lot. Thanks!
>>>
>>> So the loop termination condition is when all counters have been
>>> accumulated in _no_operation? That tells you that everyone is done,
>>> correct?
>>
>> The handshake operation have an internal counter which is set to the
>> number of emitted handshakes. Every time a handshake is performed that
>> counter is decremented. When it reaches zero we are done.
>> If we would do an extra iteration over all threads then the
>> ProcessesResult for all threads would be _no_operation.
> 
> Okay thanks. That confirms what I thought would be the end state.
> 
> 
>> We could change from do-while loop to a while a, since sometimes that
>> counter is zero before the first iteration.
>> But it happens when we have few threads, and thus the iteration is
>> short, but a tiny speedup in that case (maybe startup).
> 
> I'm good with the do-while loop. However, David and/or Ioi might have a
> different opinion if you think it might help startup.
> 
> 
>>
>>>>> src/hotspot/share/runtime/handshake.cpp
>>>>>      L66: class HandshakeSpinYield : public StackObj {
>>>>>          It would be good to have a short paragraph to explain the 
>>>>> high lights
>>>>>          of your handshake spin yield algorithm.
>>>>
>>>> Sure!
>>>
>>> I look forward to seeing it. :-)
>>
>> Is above what you had in mine?
> 
> Yes. Very nicely done.
> 
> 
>>
>>>> That is why it's called cbo, CPUBackOFF which was the name of the 
>>>> class during prototyping :)
>>>
>>> I somehow knew it couldn't be Congressional Budget Office... :-)
>>
>> :)
>>
>> Thanks, Robbin
> 
> Dan
> 
>>
>>>
>>>
>>>>
>>>> Changed to hsy.
>>>>
>>>>>
>>>>> src/hotspot/share/runtime/thread.hpp
>>>>>      No comments.
>>>>>
>>>>>
>>>>> Okay, I just re-read all the other code review comments for the 
>>>>> third time.
>>>>> I still don't understand your HandshakeSpinYield algorithm. Maybe 
>>>>> it is just
>>>>> me and I'm not seeing something obvious because my head is still 
>>>>> stuck in
>>>>> async monitor deflation land... :-)
>>>>
>>>> Let me know of above help or not?
>>>
>>> Definitely helpful.
>>>
>>> Dan
>>>
>>>
>>>>
>>>> Thanks, Robbin
>>>>
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>> On 5/7/20 3:03 AM, Robbin Ehn wrote:
>>>>>> Hi Patricio,
>>>>>>
>>>>>> On 2020-05-06 18:24, Patricio Chilano wrote:
>>>>>>>
>>>>>>> On 5/6/20 5:53 AM, Robbin Ehn wrote:
>>>>>>>> Hi Patricio,
>>>>>>>>
>>>>>>>> Here is v2 inc:
>>>>>>>> http://cr.openjdk.java.net/~rehn/8244340/v2/inc/
>>>>>>>> Full:
>>>>>>>> http://cr.openjdk.java.net/~rehn/8244340/v2/full/
>>>>>>> Thanks Robbin, looks good!
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> But I notice that when we do direct handshakes with a JavaThread we
>>>>>> should yield in blocked state. This only effects the corner case when
>>>>>> JavaThread A handshakes B and C handshakes A, C can complete the
>>>>>> handshake faster if A would yield in block state.
>>>>>>
>>>>>> v3 inc:
>>>>>> http://cr.openjdk.java.net/~rehn/8244340/v3/inc/
>>>>>>
>>>>>> v3 full:
>>>>>> http://cr.openjdk.java.net/~rehn/8244340/v3/full/
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Patricio
>>>>>>>> On 2020-05-05 21:44, Patricio Chilano wrote:
>>>>>>>>> Hi Robbin,
>>>>>>>>>
>>>>>>>>> Thanks for fixing this!
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I tested the patch in a Windows Server 2012 virtual machine 
>>>>>>>>> (not VirtualBox), with two virtual cpus assigned. Without the 
>>>>>>>>> patch, running the simple HelloWorld program takes about 
>>>>>>>>> 2.5seconds most of the times when setting “/affinity 1” and 
>>>>>>>>> around 300ms when setting “affinity /FF”. With the patch I get 
>>>>>>>>> the same timing for both cases, about 300ms. It would be great 
>>>>>>>>> if Miklos can also check it against his environment to see if 
>>>>>>>>> his issue got fixed.
>>>>>>>>
>>>>>>>> Great. They don't seem to be building the JDK them self.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Only small comments in handshake.cpp:
>>>>>>>>> - In VM_HandshakeOneThread, last log_handshake_info() should be 
>>>>>>>>> (pr == HandshakeState::_success) instead of by_vm_thread. Maybe 
>>>>>>>>> just define "HandshakeState::ProcessResult pr" instead of 
>>>>>>>>> by_vm_thread. Same with Handshake::execute_direct() to get rid 
>>>>>>>>> of by_handshaker.
>>>>>>>>
>>>>>>>> Fixed.
>>>>>>>>
>>>>>>>>> - In Handshake::execute_direct() inside the "if" statement we 
>>>>>>>>> could just break after verifying _success.
>>>>>>>>
>>>>>>>> The if statement is removed.
>>>>>>>>
>>>>>>>>> - I would remove _spin_time_ns and just keep _max_spin_time_ns, 
>>>>>>>>> since both are only set at construction only.
>>>>>>>>
>>>>>>>> Fixed.
>>>>>>>>
>>>>>>>>> - I think a better name for _last_wait_ns should be 
>>>>>>>>> last_spin_start_ns because it is updated also when we see the 
>>>>>>>>> state changed.
>>>>>>>>
>>>>>>>> Fixed.
>>>>>>>>
>>>>>>>> Thanks, Robbin
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks Robbin!
>>>>>>>>>
>>>>>>>>> Patricio
>>>>>>>>> On 5/5/20 10:08 AM, Robbin Ehn wrote:
>>>>>>>>>> Hi all, please review.
>>>>>>>>>>
>>>>>>>>>> As the issue describes there are problems with this infinitely 
>>>>>>>>>> spinning.
>>>>>>>>>> To avoid any performance regression we do not yield while we 
>>>>>>>>>> do or see
>>>>>>>>>> progress. But when we enter a period of no progression we 
>>>>>>>>>> 'quickly' give
>>>>>>>>>> up.
>>>>>>>>>>
>>>>>>>>>> Issue:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8244340
>>>>>>>>>>
>>>>>>>>>> Code:
>>>>>>>>>> http://cr.openjdk.java.net/~rehn/8244340/v1/webrev/
>>>>>>>>>>
>>>>>>>>>> In the pathologically case we can easily see 10x improvement 
>>>>>>>>>> starting the VM.
>>>>>>>>>> I have found no performance regressions or performance gains 
>>>>>>>>>> in our
>>>>>>>>>> standard benchmark setups.
>>>>>>>>>> But I see much better average completion times for handshakes.
>>>>>>>>>> For example specJVM2008 serial with ZGC shows 4x faster time to
>>>>>>>>>> completion for handshakes in my setup, but little or no effect 
>>>>>>>>>> on score
>>>>>>>>>> (ops/m).
>>>>>>>>>>
>>>>>>>>>> Passes t1-3.
>>>>>>>>>>
>>>>>>>>>> Thanks, Robbin
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>