RFR: 8244340: Handshake processing thread lacks yielding

Thu May 7 10:12:28 UTC 2020

On 7/05/2020 7:10 pm, Robbin Ehn wrote:
> Hi David,
> 
> On 2020-05-07 09:58, David Holmes wrote:
>> Hi Robbin,
>>
>> Thanks for the additional explanations in other email. I still think 
>> we have some scope for ensuring the different "spin" approaches are 
>> consistent.
> 
> We can see what we can do.
> 
>>
>> On 7/05/2020 5:03 pm, Robbin Ehn wrote:
>>> Hi Patricio,
>>>
>>> On 2020-05-06 18:24, Patricio Chilano wrote:
>>>>
>>>> On 5/6/20 5:53 AM, Robbin Ehn wrote:
>>>>> Hi Patricio,
>>>>>
>>>>> Here is v2 inc:
>>>>> http://cr.openjdk.java.net/~rehn/8244340/v2/inc/
>>>>> Full:
>>>>> http://cr.openjdk.java.net/~rehn/8244340/v2/full/
>>>> Thanks Robbin, looks good!
>>>
>>> Thanks!
>>>
>>> But I notice that when we do direct handshakes with a JavaThread we
>>> should yield in blocked state. This only effects the corner case when
>>> JavaThread A handshakes B and C handshakes A, C can complete the
>>> handshake faster if A would yield in block state.
>>
>> I'm a bit surprised you chose to do this - the thread-state transition 
>> logic would likely exceed the duration of these microsleeps and end up 
>> executing even more code in the thread we want to go off-proc. That's 
>> why these sleep routines don't themselves change the thread state etc.
> 
> The block transition takes ~500 cycles to go into blocked and ~300 to
> leave block. The fastest time for os::naked_short_nanosleep I can see is
> around 70000 cycles, 99% takes 160000 cycles. So over-head of calling
> os::naked_short_nanosleep in block state is low.

Okay - I can't argue with cycles.

>>
>> I also can't quite get my head around what kind of nested processing 
>> this can now lead to (with regard to safepoints and other handshakes 
>> on the thread doing this handshake).
> 
> We already have that here:
>      if (SafepointMechanism::should_block(self)) {
>        ThreadBlockInVM tbivm(self);
>      }
>      cbo.process();

Okay so aren't we now doubling up. If process() doesn't show success 
then we will eventually hit the wait_blocked. So no need for the 
explicit check? I guess it doesn't make too much difference but the 
placement of these checks seems somewhat arbitrary.

David
-----

> And we actually must do that since otherwise:
> A handshakes B
> B handshakes A
> If A and B never becomes safe, we have a deadlock situation.
> 
> Thanks, Robbin
> 
>>
>> Thanks,
>> David
>> -----
>>
>>> v3 inc:
>>> http://cr.openjdk.java.net/~rehn/8244340/v3/inc/
>>>
>>> v3 full:
>>> http://cr.openjdk.java.net/~rehn/8244340/v3/full/
>>>
>>> Thanks, Robbin
>>>
>>>>
>>>>
>>>> Patricio
>>>>> On 2020-05-05 21:44, Patricio Chilano wrote:
>>>>>> Hi Robbin,
>>>>>>
>>>>>> Thanks for fixing this!
>>>>>
>>>>> Thanks!
>>>>>
>>>>>>
>>>>>> I tested the patch in a Windows Server 2012 virtual machine (not 
>>>>>> VirtualBox), with two virtual cpus assigned. Without the patch, 
>>>>>> running the simple HelloWorld program takes about 2.5seconds most 
>>>>>> of the times when setting “/affinity 1” and around 300ms when 
>>>>>> setting “affinity /FF”. With the patch I get the same timing for 
>>>>>> both cases, about 300ms. It would be great if Miklos can also 
>>>>>> check it against his environment to see if his issue got fixed.
>>>>>
>>>>> Great. They don't seem to be building the JDK them self.
>>>>>
>>>>>>
>>>>>> Only small comments in handshake.cpp:
>>>>>> - In VM_HandshakeOneThread, last log_handshake_info() should be 
>>>>>> (pr == HandshakeState::_success) instead of by_vm_thread. Maybe 
>>>>>> just define "HandshakeState::ProcessResult pr" instead of 
>>>>>> by_vm_thread. Same with Handshake::execute_direct() to get rid of 
>>>>>> by_handshaker.
>>>>>
>>>>> Fixed.
>>>>>
>>>>>> - In Handshake::execute_direct() inside the "if" statement we 
>>>>>> could just break after verifying _success.
>>>>>
>>>>> The if statement is removed.
>>>>>
>>>>>> - I would remove _spin_time_ns and just keep _max_spin_time_ns, 
>>>>>> since both are only set at construction only.
>>>>>
>>>>> Fixed.
>>>>>
>>>>>> - I think a better name for _last_wait_ns should be 
>>>>>> last_spin_start_ns because it is updated also when we see the 
>>>>>> state changed.
>>>>>
>>>>> Fixed.
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>>>
>>>>>> Thanks Robbin!
>>>>>>
>>>>>> Patricio
>>>>>> On 5/5/20 10:08 AM, Robbin Ehn wrote:
>>>>>>> Hi all, please review.
>>>>>>>
>>>>>>> As the issue describes there are problems with this infinitely 
>>>>>>> spinning.
>>>>>>> To avoid any performance regression we do not yield while we do 
>>>>>>> or see
>>>>>>> progress. But when we enter a period of no progression we 
>>>>>>> 'quickly' give
>>>>>>> up.
>>>>>>>
>>>>>>> Issue:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8244340
>>>>>>>
>>>>>>> Code:
>>>>>>> http://cr.openjdk.java.net/~rehn/8244340/v1/webrev/
>>>>>>>
>>>>>>> In the pathologically case we can easily see 10x improvement 
>>>>>>> starting the VM.
>>>>>>> I have found no performance regressions or performance gains in our
>>>>>>> standard benchmark setups.
>>>>>>> But I see much better average completion times for handshakes.
>>>>>>> For example specJVM2008 serial with ZGC shows 4x faster time to
>>>>>>> completion for handshakes in my setup, but little or no effect on 
>>>>>>> score
>>>>>>> (ops/m).
>>>>>>>
>>>>>>> Passes t1-3.
>>>>>>>
>>>>>>> Thanks, Robbin
>>>>>>
>>>>