RFR(XL): 8203469: Faster safepoints

Mon Feb 11 01:24:25 UTC 2019

Hi Robbin,

Some comments below but I just noticed there seems to be a major problem 
with your webrevs/patches. If look at:

http://cr.openjdk.java.net/~rehn/8203469/v06_2/full/webrev/open.changeset

It shows in safepoint.cpp:

+void SafepointSynchronize::increment_jni_active_count() {
+  Atomic::add(1, &_current_jni_active_count);
+}

but if I look at the safepoint.cpp raw file:

http://cr.openjdk.java.net/~rehn/8203469/v06_2/full/webrev/raw_files/new/src/hotspot/share/runtime/safepoint.cpp

it shows the correct:

void SafepointSynchronize::increment_jni_active_count() {
   ++_current_jni_active_count;
}

This makes it impossible to use the changeset to search through to check 
out how things are being used. In particular in this case there doesn't 
seem to be a need for the 
SafepointSynchronize::increment_jni_active_count() function because its 
only used by the VMThread now from within the other safepoint code.

More below ...

On 8/02/2019 2:05 am, Robbin Ehn wrote:
> Hi all, here is the promised re-base (v06) on
> 8210832: Remove sneaky locking in class Monitor.
> 
> v06_1 is just a straight re-base.
> 
> Full:
> http://cr.openjdk.java.net/~rehn/8203469/v06_1/full/
> Inc:
> http://cr.openjdk.java.net/~rehn/8203469/v06_1/inc/
> 
> Passes stress test and t1-5.
> 
> But there is a 'better' way.
> Before I added the more graceful "_vm_wait->wait();" semaphore in the while
> (_waiting_to_block > 0) { loop, it was a just a busy spin using the same

I think I've lost track of the changes here. Initially we waited on the 
_safepoint_lock; then in preview_13 we switched to polling with backoff. 
Not sure when _vm_wait->wait was introduced but we're now returning to 
polling with backoff. Is that right?

Once the VMThread polls until all threads are known to be safepoint-safe 
we eliminate a whole category of race conditions with the thread-state 
changes that allows us to simplify those thread-state changes.

> back-off as the rolling forward loop. It turns out that we mostly never 
> spin
> here at all, when all java threads are stop the callbacks is often 
> already done.
> So the addition of the semaphore have no impact on our benchmarks and is 
> mostly
> unused. This is because most threads are in java which we need to spin-wait
> since they can elide into native without doing a callback. My proposed 
> re-base
> removes the the callbacks completely and let the vm thread do all thread
> accounting. All that the stopping threads needs to do is write state and
> safepoint id, everything else is handle by the vm thread. We trade 2 
> atomics +
> a local store per thread against doing 2 stores per thread by the vm 
> thread.
> This makes it possible for a thread in vm to transition into blocked 
> WITHOUT
> safepoint poll. Just set thread_blocked and promise to do safepoint poll 
> when
> leaving that state.

Right - but you've only implemented that for the new 
ThreadBlockInVMWithDeadlockCheck, though the same argument applies to 
the regular TBIVM. We can/should re-examine TBIVM logic, and see if we 
can simplify and combine with new ThreadBlockInVMWithDeadlockCheck. 
Future RFE.

Aside: from a safepoint perspective I really don't see any difference 
between going out to "native" and going out to a (potentially) 
"blocking" library call. There are both safepoint-safe and should have 
the same requirements as you exit and enter the VM.

> v06_2
> Full:
> http://cr.openjdk.java.net/~rehn/8203469/v06_2/full/
> Inc against v05:
> http://cr.openjdk.java.net/~rehn/8203469/v06_2/inc/
> Inc against v06_1:
> http://cr.openjdk.java.net/~rehn/8203469/v06_2/rebase_inc/
> 
> v06_2 simplifies and removes ~200 LOC with same performance.
> If there is a case with a thread in vm taking long time, it will already
> screw-up latency and thus should be fixed regardless of v06_1 vs v06_2. 
> So I
> see no reason why we should not push v06_2.

src/hotspot/share/runtime/interfaceSupport.inline.hpp

317     thread->set_thread_state((JavaThreadState)(_thread_in_vm + 1));

Should use _thread_in_vm_trans

326     _thread->set_thread_state((JavaThreadState)(_thread_blocked + 1));

Should use _thread_blocked_trans

That said, I'm unclear what role the trans states now play here. 
Originally the VMThread had to spin when it saw a trans state because 
the final state would determine how the VMThread needed to respond next. 
But with the VMThread always spinning I don't see that we necessarily 
need the additional complexity. Anyway that too could be looked at later.

I'd like to take one more pass through the full changeset once it has 
been generated correctly.

Thanks,
David

> Passes stress test and t1-5.
> 
> Thanks, Robbin