RFR: 8244340: Handshake processing thread lacks yielding

Thu May 7 09:10:25 UTC 2020

Hi David,

On 2020-05-07 09:58, David Holmes wrote:
> Hi Robbin,
> 
> Thanks for the additional explanations in other email. I still think we 
> have some scope for ensuring the different "spin" approaches are 
> consistent.

We can see what we can do.

> 
> On 7/05/2020 5:03 pm, Robbin Ehn wrote:
>> Hi Patricio,
>>
>> On 2020-05-06 18:24, Patricio Chilano wrote:
>>>
>>> On 5/6/20 5:53 AM, Robbin Ehn wrote:
>>>> Hi Patricio,
>>>>
>>>> Here is v2 inc:
>>>> http://cr.openjdk.java.net/~rehn/8244340/v2/inc/
>>>> Full:
>>>> http://cr.openjdk.java.net/~rehn/8244340/v2/full/
>>> Thanks Robbin, looks good!
>>
>> Thanks!
>>
>> But I notice that when we do direct handshakes with a JavaThread we
>> should yield in blocked state. This only effects the corner case when
>> JavaThread A handshakes B and C handshakes A, C can complete the
>> handshake faster if A would yield in block state.
> 
> I'm a bit surprised you chose to do this - the thread-state transition 
> logic would likely exceed the duration of these microsleeps and end up 
> executing even more code in the thread we want to go off-proc. That's 
> why these sleep routines don't themselves change the thread state etc.

The block transition takes ~500 cycles to go into blocked and ~300 to
leave block. The fastest time for os::naked_short_nanosleep I can see is
around 70000 cycles, 99% takes 160000 cycles. So over-head of calling
os::naked_short_nanosleep in block state is low.

> 
> I also can't quite get my head around what kind of nested processing 
> this can now lead to (with regard to safepoints and other handshakes on 
> the thread doing this handshake).

We already have that here:
     if (SafepointMechanism::should_block(self)) {
       ThreadBlockInVM tbivm(self);
     }
     cbo.process();

And we actually must do that since otherwise:
A handshakes B
B handshakes A
If A and B never becomes safe, we have a deadlock situation.

Thanks, Robbin

> 
> Thanks,
> David
> -----
> 
>> v3 inc:
>> http://cr.openjdk.java.net/~rehn/8244340/v3/inc/
>>
>> v3 full:
>> http://cr.openjdk.java.net/~rehn/8244340/v3/full/
>>
>> Thanks, Robbin
>>
>>>
>>>
>>> Patricio
>>>> On 2020-05-05 21:44, Patricio Chilano wrote:
>>>>> Hi Robbin,
>>>>>
>>>>> Thanks for fixing this!
>>>>
>>>> Thanks!
>>>>
>>>>>
>>>>> I tested the patch in a Windows Server 2012 virtual machine (not 
>>>>> VirtualBox), with two virtual cpus assigned. Without the patch, 
>>>>> running the simple HelloWorld program takes about 2.5seconds most 
>>>>> of the times when setting “/affinity 1” and around 300ms when 
>>>>> setting “affinity /FF”. With the patch I get the same timing for 
>>>>> both cases, about 300ms. It would be great if Miklos can also check 
>>>>> it against his environment to see if his issue got fixed.
>>>>
>>>> Great. They don't seem to be building the JDK them self.
>>>>
>>>>>
>>>>> Only small comments in handshake.cpp:
>>>>> - In VM_HandshakeOneThread, last log_handshake_info() should be (pr 
>>>>> == HandshakeState::_success) instead of by_vm_thread. Maybe just 
>>>>> define "HandshakeState::ProcessResult pr" instead of by_vm_thread. 
>>>>> Same with Handshake::execute_direct() to get rid of by_handshaker.
>>>>
>>>> Fixed.
>>>>
>>>>> - In Handshake::execute_direct() inside the "if" statement we could 
>>>>> just break after verifying _success.
>>>>
>>>> The if statement is removed.
>>>>
>>>>> - I would remove _spin_time_ns and just keep _max_spin_time_ns, 
>>>>> since both are only set at construction only.
>>>>
>>>> Fixed.
>>>>
>>>>> - I think a better name for _last_wait_ns should be 
>>>>> last_spin_start_ns because it is updated also when we see the state 
>>>>> changed.
>>>>
>>>> Fixed.
>>>>
>>>> Thanks, Robbin
>>>>
>>>>>
>>>>> Thanks Robbin!
>>>>>
>>>>> Patricio
>>>>> On 5/5/20 10:08 AM, Robbin Ehn wrote:
>>>>>> Hi all, please review.
>>>>>>
>>>>>> As the issue describes there are problems with this infinitely 
>>>>>> spinning.
>>>>>> To avoid any performance regression we do not yield while we do or 
>>>>>> see
>>>>>> progress. But when we enter a period of no progression we 
>>>>>> 'quickly' give
>>>>>> up.
>>>>>>
>>>>>> Issue:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8244340
>>>>>>
>>>>>> Code:
>>>>>> http://cr.openjdk.java.net/~rehn/8244340/v1/webrev/
>>>>>>
>>>>>> In the pathologically case we can easily see 10x improvement 
>>>>>> starting the VM.
>>>>>> I have found no performance regressions or performance gains in our
>>>>>> standard benchmark setups.
>>>>>> But I see much better average completion times for handshakes.
>>>>>> For example specJVM2008 serial with ZGC shows 4x faster time to
>>>>>> completion for handshakes in my setup, but little or no effect on 
>>>>>> score
>>>>>> (ops/m).
>>>>>>
>>>>>> Passes t1-3.
>>>>>>
>>>>>> Thanks, Robbin
>>>>>
>>>