RFR(XL): 8203469: Faster safepoints
Daniel D. Daugherty
daniel.daugherty at oracle.com
Sat Feb 9 02:42:04 UTC 2019
On 2/7/19 11:05 AM, Robbin Ehn wrote:
> But there is a 'better' way.
> Before I added the more graceful "_vm_wait->wait();" semaphore in the
> while
> (_waiting_to_block > 0) { loop, it was a just a busy spin using the same
> back-off as the rolling forward loop. It turns out that we mostly
> never spin
> here at all, when all java threads are stop the callbacks is often
> already done.
> So the addition of the semaphore have no impact on our benchmarks and
> is mostly
> unused. This is because most threads are in java which we need to
> spin-wait
> since they can elide into native without doing a callback. My proposed
> re-base
> removes the the callbacks completely and let the vm thread do all thread
> accounting. All that the stopping threads needs to do is write state and
> safepoint id, everything else is handle by the vm thread. We trade 2
> atomics +
> a local store per thread against doing 2 stores per thread by the vm
> thread.
> This makes it possible for a thread in vm to transition into blocked
> WITHOUT
> safepoint poll. Just set thread_blocked and promise to do safepoint
> poll when
> leaving that state.
>
> v06_2
> Full:
> http://cr.openjdk.java.net/~rehn/8203469/v06_2/full/
This webrev's patch file is again a couple of patches together which
makes searches of the patch for patterns problematic. I still managed
to find an errant use of _call_back in a comment, but it wasn't easy.
> Inc against v05:
> http://cr.openjdk.java.net/~rehn/8203469/v06_2/inc/
> Inc against v06_1:
> http://cr.openjdk.java.net/~rehn/8203469/v06_2/rebase_inc/
>
> v06_2 simplifies and removes ~200 LOC with same performance.
> If there is a case with a thread in vm taking long time, it will already
> screw-up latency and thus should be fixed regardless of v06_1 vs
> v06_2. So I
> see no reason why we should not push v06_2.
>
> Passes stress test and t1-5.
>
> Thanks, Robbin
src/hotspot/share/code/dependencyContext.hpp
No comments.
src/hotspot/share/jfr/recorder/repository/jfrEmergencyDump.cpp
No comments.
src/hotspot/share/jfr/recorder/stacktrace/jfrStackTraceRepository.cpp
No comments.
src/hotspot/share/runtime/handshake.cpp
No comments.
src/hotspot/share/runtime/interfaceSupport.inline.hpp
L317: thread->set_thread_state((JavaThreadState)(_thread_in_vm + 1));
L318: InterfaceSupport::serialize_thread_state_with_handler(thread);
L319:
old L320: SafepointMechanism::callback_if_safepoint(thread);
L320: thread->set_thread_state(_thread_blocked);
I'm having trouble with dropping callback_if_safepoint() here,
but I can't put my finger on it yet. We're taking out these
lines of code:
L66: if (!uses_thread_local_poll() || local_poll_armed(thread)) {
L73: if (global_poll()) {
L74: SafepointSynchronize::block(thread, false);
which my brain first read as taking out blocking for a safepoint,
but then I remembered the 'false' parameter means we don't actually
block for the safepoint. Okay, I'm glad that parameter is now gone
from block(). So now it's much clearer that we are not blocking for
a safepoint in the ThreadBlockInVMWithDeadlockCheck ctr. This is
new behavior with Patricio's remove sneaky locking code and is not
new with faster safepoints.
src/hotspot/share/runtime/mutex.cpp
No comments.
src/hotspot/share/runtime/mutex.hpp
No comments.
src/hotspot/share/runtime/mutexLocker.cpp
No comments.
src/hotspot/share/runtime/mutexLocker.hpp
No comments.
src/hotspot/share/runtime/safepointMechanism.cpp
No comments.
src/hotspot/share/runtime/safepointMechanism.hpp
No comments.
src/hotspot/share/runtime/safepointMechanism.inline.hpp
No comments.
src/hotspot/share/runtime/thread.hpp
No comments.
src/hotspot/share/runtime/vmThread.cpp
No comments.
src/hotspot/share/services/runtimeService.cpp
No comments.
src/hotspot/share/services/runtimeService.hpp
No comments.
test/hotspot/jtreg/runtime/logging/SafepointTest.java
No comments.
src/hotspot/share/runtime/safepoint.hpp
L115: // Back off strategy when synchronizing a safepoint:
L116: static void back_off();
This function doesn't exist.
src/hotspot/share/runtime/safepoint.cpp
Is this the code you intended to put in a back_off() function?
L273: if (still_running > 0) {
L274: // iterations will be 2 the first time we enter this
spin back-off.
L275: // naked_short_nanosleep takes tenths of micros which
means that
L276: // number of nanoseconds is irrelevant if it's below
that. We do
L277: // 20 1 ns sleeps with a total cost of ~1 ms, then we
do 1 ms sleeps.
L278: jlong sleep_ns = 1;
L279: if (iterations > 21) {
L280: sleep_ns = NANOUNITS / MILLIUNITS; // 1 ms
L281: }
L282: os::naked_short_nanosleep(sleep_ns);
L283: }
Refactoring it in back_off() would clean things up a bit.
L1082: // Check state. block() will set thread state to
thread_in_vm which will
L1083: // cause the safepoint state _type to become _call_back.
L1084: assert(!SafepointMechanism::uses_global_page_poll() ||
!_safepoint_safe,
L1085: "polling page exception on thread safepoint safe");
This comment is now stale, but I'm not sure how it should be
rewritten. The original comment doesn't really match the
original assert() so I wonder if it went stale before this
fix... Perhaps the comment should be:
// If we're using a global poll, then the thread should not be
// marked as safepoint safe yet.
There are pluses and minuses to the JavaThreads no longer calling back:
pluses:
- It was a bit confusing when a JavaThread could call back for itself
or the VMThread could call back for a JavaThread in other cases.
- Now, only the VMThread handles setting _safepoint_safe which is the
replacement for call back. So you want some asserts in there?
minuses:
- It feels like we've lost some debugging state with the removal of
call back, but maybe that's not right. I guess we couldn't
distinguish
_who_ called back before (JavaThread for self or VMThread).
I'm not entirely convinced that we have all of our bases covered for a
thread blocking at a safepoint, but we have that worry with Patricio's
removal of sneaky locking and we're going to shake that out with some
targeted stress testing.
Robbin, you mentioned stress test above. What kind of stress testing did
you run and for how long?
I think I'm good with where v06_2 is going. Of course, I'll mull on it
over the weekend and I may have more comments next week.
Dan
More information about the hotspot-dev
mailing list