RFR(XL): 8203469: Faster safepoints

Mon Feb 11 10:15:47 UTC 2019

Hi Karen,

On 2/8/19 11:52 PM, Karen Kinnear wrote:
> 1. safepoint.cpp
> I thought with the JFR change you could remove the try_stable_load_state with 
> InactiveSafepointCounter? When would you be in this code and not with a valid 
> safepoint_count?

During a handshake, which calls handshake_safe(...).

> 
> 2. _waiting_to_block and _current_jni_active_count are no longer volatile - 
> assumed only read/modified by VMThread - could you add assertions in the 
> accessors that this is the VMThread?

Sure.

> 
> 3. safepoint.cpp line 413: "Keep event from now." -> what does this comment mean?
>     line 815: "safepoint callback" -> can you update the comment since we 
> removed the "callback" mechanism - I do realize that we still have the 
> block_if_requested.

Yes.

> 
> 4. Do you have stress tests with high thread counts? Does the updated mechanism 
> for _thread_in_vm work as well there? They all complete and no latency issues?

I have seen any problems, we know that it probably exists. Such as the 
initialization of C2 compiler use todo a 7ms operation in VM.
The VM thread would in that case start sleeping 1 ms at the time.
So the outlier would potential be 8 ms instead of 7ms.
But as I said, we which fix 7ms anyways :)

> 
> 5. Do you test with ThreadLocalHandshakes off and on?

Yes.

> 
> Future Cleanup Wishlist:
> 1. TBIVM:
>     - clean up code that is trying to do an explicit safepoint_poll,
>     at that point the TBIVM could also skip the safepoint check in the constructor ?

Yes, but in some-places it is a bit miss-used to do a safepoint poll.
We should go over those places and change to block_if_request.
And then remove the front-edge safepoint poll.

> 2. when do we check safepoint_safe_with and native is not walkable (or 
> !has_last_Java_frame ?)
>     why is this not an assertion? When do we transition to native without 
> make_walkable?

JNI critical methods.

> 3. ThreadInVMfromUnknown - what if not in _thread_in_native? Then we don't
> transition to _thread_in_vm and we are not in the state we think we are - when
> does this happen and what state are we left in?
> 

This is a old transition, i have not changed it.
I believe the compiler have code path which is both executed with state in vm 
and state in native. If they are already in vm it does nothing.

Thanks, Robbin

> 
>> On Feb 8, 2019, at 5:37 PM, Daniel D. Daugherty <daniel.daugherty at oracle.com 
>> <mailto:daniel.daugherty at oracle.com>> wrote:
>>
>> Robbin,
>>
>> Because this is a completely different way of solving this problem, I don't
>> think I can review this incrementally. That means another crawl through
>> review and might even mean another round of whiteboard diagrams...
>>
>> A proper review will obviously take me longer than I planned, but I wanted
>> you know that I'm starting to look at it from the beginning... :-)
>>
>> Dan
>>
>>
>> On 2/7/19 11:05 AM, Robbin Ehn wrote:
>>> But there is a 'better' way.
>>> Before I added the more graceful "_vm_wait->wait();" semaphore in the while
>>> (_waiting_to_block > 0) { loop, it was a just a busy spin using the same
>>> back-off as the rolling forward loop. It turns out that we mostly never spin
>>> here at all, when all java threads are stop the callbacks is often already done.
>>> So the addition of the semaphore have no impact on our benchmarks and is mostly
>>> unused. This is because most threads are in java which we need to spin-wait
>>> since they can elide into native without doing a callback. My proposed re-base
>>> removes the the callbacks completely and let the vm thread do all thread
>>> accounting. All that the stopping threads needs to do is write state and
>>> safepoint id, everything else is handle by the vm thread. We trade 2 atomics +
>>> a local store per thread against doing 2 stores per thread by the vm thread.
>>> This makes it possible for a thread in vm to transition into blocked WITHOUT
>>> safepoint poll. Just set thread_blocked and promise to do safepoint poll when
>>> leaving that state.
>>>
>>> v06_2
>>> Full:
>>> http://cr.openjdk.java.net/~rehn/8203469/v06_2/full/
>>> Inc against v05:
>>> http://cr.openjdk.java.net/~rehn/8203469/v06_2/inc/
>>> Inc against v06_1:
>>> http://cr.openjdk.java.net/~rehn/8203469/v06_2/rebase_inc/
>>>
>>> v06_2 simplifies and removes ~200 LOC with same performance.
>>> If there is a case with a thread in vm taking long time, it will already
>>> screw-up latency and thus should be fixed regardless of v06_1 vs v06_2. So I
>>> see no reason why we should not push v06_2.
>>>
>>> Passes stress test and t1-5.
>>>
>>> Thanks, Robbin
>>
>