RFR(XL): 8203469: Faster safepoints

Mon Feb 11 21:24:45 UTC 2019

On 11/02/2019 7:02 pm, Robbin Ehn wrote:
> Hi David,
> 
> On 2/11/19 2:24 AM, David Holmes wrote:
>> This makes it impossible to use the changeset to search through to 
>> check out how things are being used. In particular in this case there 
>> doesn't seem to be a need for the 
>> SafepointSynchronize::increment_jni_active_count() function because 
>> its only used by the VMThread now from within the other safepoint code.
>>
> 
> We sorted this out in other channel, it's a webrev bug (or miss-feature).

Okay. I applied your patch and regenerated a clean webrev here in case 
anyone else would like to see it:

http://cr.openjdk.java.net/~dholmes/8203469/webrev.v06_2/

I'll take another look through this morning.

David
-----

>> More below ...
>>> But there is a 'better' way.
>>> Before I added the more graceful "_vm_wait->wait();" semaphore in the 
>>> while
>>> (_waiting_to_block > 0) { loop, it was a just a busy spin using the same
>>
>> I think I've lost track of the changes here. Initially we waited on 
>> the _safepoint_lock; then in preview_13 we switched to polling with 
>> backoff. Not sure when _vm_wait->wait was introduced but we're now 
>> returning to polling with backoff. Is that right?
> 
> Yes.
> 
>>
>> Once the VMThread polls until all threads are known to be 
>> safepoint-safe we eliminate a whole category of race conditions with 
>> the thread-state changes that allows us to simplify those thread-state 
>> changes.
> 
> Yes.
> 
>>
>>> back-off as the rolling forward loop. It turns out that we mostly 
>>> never spin
>>> here at all, when all java threads are stop the callbacks is often 
>>> already done.
>>> So the addition of the semaphore have no impact on our benchmarks and 
>>> is mostly
>>> unused. This is because most threads are in java which we need to 
>>> spin-wait
>>> since they can elide into native without doing a callback. My 
>>> proposed re-base
>>> removes the the callbacks completely and let the vm thread do all thread
>>> accounting. All that the stopping threads needs to do is write state and
>>> safepoint id, everything else is handle by the vm thread. We trade 2 
>>> atomics +
>>> a local store per thread against doing 2 stores per thread by the vm 
>>> thread.
>>> This makes it possible for a thread in vm to transition into blocked 
>>> WITHOUT
>>> safepoint poll. Just set thread_blocked and promise to do safepoint 
>>> poll when
>>> leaving that state.
>>
>> Right - but you've only implemented that for the new 
>> ThreadBlockInVMWithDeadlockCheck, though the same argument applies to 
>> the regular TBIVM. We can/should re-examine TBIVM logic, and see if we 
>> can simplify and combine with new ThreadBlockInVMWithDeadlockCheck. 
>> Future RFE.
> 
> Yes!
> 
>>
>> Aside: from a safepoint perspective I really don't see any difference 
>> between going out to "native" and going out to a (potentially) 
>> "blocking" library call. There are both safepoint-safe and should have 
>> the same requirements as you exit and enter the VM.
> 
> Yes.
> 
>>
>>> v06_2
>>> Full:
>>> http://cr.openjdk.java.net/~rehn/8203469/v06_2/full/
>>> Inc against v05:
>>> http://cr.openjdk.java.net/~rehn/8203469/v06_2/inc/
>>> Inc against v06_1:
>>> http://cr.openjdk.java.net/~rehn/8203469/v06_2/rebase_inc/
>>>
>>> v06_2 simplifies and removes ~200 LOC with same performance.
>>> If there is a case with a thread in vm taking long time, it will already
>>> screw-up latency and thus should be fixed regardless of v06_1 vs 
>>> v06_2. So I
>>> see no reason why we should not push v06_2.
>>
>> src/hotspot/share/runtime/interfaceSupport.inline.hpp
>>
>> 317     thread->set_thread_state((JavaThreadState)(_thread_in_vm + 1));
>>
>> Should use _thread_in_vm_trans
>>
>> 326     _thread->set_thread_state((JavaThreadState)(_thread_blocked + 
>> 1));
>>
>> Should use _thread_blocked_trans
> 
> Yes.
> 
>>
>> That said, I'm unclear what role the trans states now play here. 
>> Originally the VMThread had to spin when it saw a trans state because 
>> the final state would determine how the VMThread needed to respond 
>> next. But with the VMThread always spinning I don't see that we 
>> necessarily need the additional complexity. Anyway that too could be 
>> looked at later.
> 
> Yes. Part of the bigger plan is to simplify the states.
> If we ignore the trace_flag (suspend/resume, async exception,..) we 
> could now just have two states, safe and unsafe and you only need to 
> safepoint poll when going to a safe state.
> 
>>
>> I'd like to take one more pass through the full changeset once it has 
>> been generated correctly.
> 
> Thanks, Robbin
> 
>>
>> Thanks,
>> David
>>
>>> Passes stress test and t1-5.
>>>
>>> Thanks, Robbin