RFR(XL): 8203469: Faster safepoints
Daniel D. Daugherty
daniel.daugherty at oracle.com
Tue Jan 22 20:34:08 UTC 2019
On 1/22/19 10:39 AM, Robbin Ehn wrote:
> Hi all, here is v01 and v02.
>
> v01 contains update after comments from list:
> http://cr.openjdk.java.net/~rehn/8203469/v01/
> http://cr.openjdk.java.net/~rehn/8203469/v01/inc/
>
> v02 contains a bug fix, explained below:
> http://cr.openjdk.java.net/~rehn/8203469/v02/
src/hotspot/share/code/dependencyContext.hpp
No comments.
src/hotspot/share/jfr/recorder/repository/jfrEmergencyDump.cpp
No comments.
src/hotspot/share/jfr/recorder/stacktrace/jfrStackTraceRepository.cpp
No comments.
src/hotspot/share/runtime/handshake.cpp
No comments.
src/hotspot/share/runtime/mutex.cpp
No comments.
src/hotspot/share/runtime/mutex.hpp
No comments.
src/hotspot/share/runtime/mutexLocker.cpp
No comments.
src/hotspot/share/runtime/mutexLocker.hpp
No comments.
src/hotspot/share/runtime/safepoint.cpp
L136, L138, L141 - nit - why add the blank lines?
L148: // We need to save the desc since it is removed before we
need it.
Perhaps:
// We need a place to save the desc since it is released
before we need it.
L400: // Save the starting time, so that it can be compared to
see if this has taken
L401: // too long to complete.
L402: jlong safepoint_limit_time = 0;
L403: if (SafepointTimeout) {
L404: safepoint_limit_time = os::javaTimeNanos() +
(jlong)SafepointTimeoutDelay * MICROUNITS;
Mostly re-existing, but this should be:
jlong safepoint_limit_time = 0;
if (SafepointTimeout) {
// Set the limit time, so that it can be compared to see
if this has taken
// too long to complete.
safepoint_limit_time = os::javaTimeNanos() +
(jlong)SafepointTimeoutDelay * MICROUNITS;
}
L477: // We first do the safepoint cleanup since if this is a GC
safepoint,
L478: // needs it to be completed before running the GC op.
Perhaps this rewrite:
// We do the safepoint cleanup first since a GC related
safepoint
// needs cleanup to be completed before running the GC op.
L515: OrderAccess::fence();
Perhaps add comment:
OrderAccess::fence(); // keep read and write of _state
from floating up
L526: // Keep the local state from floating up.
L527: OrderAccess::fence();
nit - comment on L526 can be on L527, e.g.:
OrderAccess::fence(); // Keep the local state from
floating up.
L749: nit - please delete blank line at top of function.
L750: assert((safepoint_count != InactiveSafepointCounter &&
L751: Thread::current() == (Thread*)VMThread::vm_thread() &&
L752: SafepointSynchronize::_state != _not_synchronized) ||
L753: safepoint_count == InactiveSafepointCounter,
"Invalid check");
Had to read this a couple of times. Perhaps this would be more
clear:
assert((safepoint_count != InactiveSafepointCounter &&
Thread::current() == (Thread*)VMThread::vm_thread() &&
SafepointSynchronize::_state != _not_synchronized)
|| safepoint_count == InactiveSafepointCounter,
"Invalid check");
Please note that the final expression's '||' is lined up with
first parenthetical expression...
L755: // To handle the thread_blocked state on the backedge of
the WaitBarrier from
L756: // previous safepoint and reading the resetted
(0/InactiveSafepointCounter) we
L757: // re-read state after we read thread safepoint id. The
JavaThread changes it
L758: // state before resetting, the second read will either see
a different thread
L759: // state making this an unsafe state or it can see blocked
again.
L760: // When we see blocked twice with a 0 safepoint id, either:
L761: // - It is normally blocked, e.g. on Mutex, TBIVM.
L762: // - It was in SS:block(), looped around to SS:block() and
is blocked on the WaitBarrier.
L763: // - It was in SS:block() but now on a Mutex.
L764: // Either case safe.
Please consider these minor tweaks:
// To handle the thread_blocked state on the backedge of the
WaitBarrier from a
// previous safepoint and reading the possibly reset
(0/InactiveSafepointCounter)
// id, re-read state after we read thread safepoint id. If
the JavaThread changes
// its state before resetting, the second read will either
see a different thread
// state making this an unsafe state or it can see blocked
again.
// When we see blocked twice with a 0 safepoint id, this means:
// - It is normally blocked, e.g., on Mutex, TBIVM.
// - It was in SS:block(), looped around to SS:block() and
is blocked on the WaitBarrier.
// - It was in SS:block() but now on a Mutex.
// All of these cases are safe.
old L743: return !thread->has_last_Java_frame() ||
thread->frame_anchor()->walkable();
new L780: return !thread->has_last_Java_frame() ||
new L781: thread->frame_anchor()->walkable();
nit - Why was this line reformatted/split?
L893: // Load here stopped by above release.
Perhaps:
// Load here cannot float because of the above release.
src/hotspot/share/runtime/safepoint.hpp
No comment.
src/hotspot/share/runtime/safepointMechanism.inline.hpp
No comments.
src/hotspot/share/runtime/thread.hpp
No comments.
src/hotspot/share/runtime/vmThread.cpp
No comments.
src/hotspot/share/services/runtimeService.cpp
No comments.
src/hotspot/share/services/runtimeService.hpp
No comments.
test/hotspot/jtreg/runtime/logging/SafepointTest.java
No comments.
Thanks for persisting with this work. Thumbs up! All of my comments
in this round are editorial. I don't need to see another webrev if
you choose to make the above changes.
Dan
> http://cr.openjdk.java.net/~rehn/8203469/v02/inc/
>
> Patricio had some good questions about try_stable_load_state.
> In previous internal versions I have done the stable load by loading
> thread state before and after safepoint id. For some reason I changed
> during a
> refactoring to the reverse, which is incorrect. Consider the following:
>
> JavaThread: state / safepoint id / poll |VMThread: global state /
> safepoint counter / WaitBarrier
> ########################################|################################
> _thread_in_native / 0 / disarmed | _not_synchronized / 0 /
> disarmed
> | _not_synchronized / 0 /
> armed(1)
> | _not_synchronized / 1 /
> armed(1)
> | _synchronizing / 1 /
> armed(1)
> _thread_in_native / 0 / armed |
> | <LOAD JavaThread safepoint
> id:0>
> | <LOAD JavaThread thread
> state id:_thread_in_native>
> | <LOAD JavaThread safepoint
> id:0>
> | _synchonized / 1 /
> armed(1)
> <JavaThread transistion to VM> |
> _thread_in_native_trans / 0 / armed |
> <LOAD safepoint counter(1)> |
> <JavaThread goes off-proc> |
> | _not_synchonized / 1 /
> armed(1)
> | _not_synchonized / 2 /
> armed(1)
> _thread_in_native_trans / 0 / disarmed |
> | _not_synchonized / 2 /
> disarmed
> Next safepoint starts:
> | _not_synchronized / 2 /
> armed(3)
> | _not_synchronized / 3 /
> armed(3)
> | _synchronizing / 3 /
> armed(3)
> _thread_in_native_trans / 0 / armed |
> | <LOAD JavaThread safepoint
> id:0>
> <JavaThread goes on-proc> |
> <STORE loaded safepoint counter(1)> |
> _thread_in_native_trans / 1 / armed |
> _thread_blocked / 1 / armed |
> <WaitBarrier not armed for 1> |
> | <LOAD JavaThread thread
> state id:_thread_blocked>
> _thread_in_native_trans / 1 / armed |
> _thread_in_native_trans / 0 / armed |
> | <LOAD JavaThread safepoint
> id:0>
>
> A false positive is read.
>
> When do it the correct the safe matrix looks like:
> State load 1 | Safepoint id | State load 2 | Result
> ##################|##############|##################|#######
> any | !0/current | any | treat all as unsafe
> any | any | !state1 | treat all as unsafe
> any | 0/current | state1 | suspend flag is
> safe
> thread_in_native | 0/current | thread_in_native | safe
> thread_in_blocked | 0/current | thread_in_blocked| safe
> !thread_in_blocked
> &&
> !thread_in_native | 0/current | state1 | unsafe
>
> The case with blocked/0/blocked I added this comment for:
>
> 755 // To handle the thread_blocked state on the backedge of the
> WaitBarrier from
> 756 // previous safepoint and reading the resetted
> (0/InactiveSafepointCounter) we
> 757 // re-read state after we read thread safepoint id. The
> JavaThread changes it
> 758 // state before resetting, the second read will either see a
> different thread
> 759 // state making this an unsafe state or it can see blocked again.
> 760 // When we see blocked twice with a 0 safepoint id, either:
> 761 // - It is normally blocked, e.g. on Mutex, TBIVM.
> 762 // - It was in SS:block(), looped around to SS:block() and is
> blocked on the WaitBarrier.
> 763 // - It was in SS:block() but now on a Mutex.
> 764 // Either case safe.
>
> I hope above explains why loading state before and after safepoint id is
> sufficient.
>
> Passes, with flying colors, t1-5, stress test, KS 24h stress.
>
> Thanks, Robbin
>
> On 1/15/19 11:39 AM, Robbin Ehn wrote:
>> Hi all, please review.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8203469
>> Code: http://cr.openjdk.java.net/~rehn/8203469/v00/webrev/
>>
>> Thanks to Dan for pre-reviewing a lot!
>>
>> Background:
>> ZGC often does very short safepoint operations. For a perspective, in a
>> specJBB2015 run, G1 can have young collection stops lasting about 170
>> ms. While
>> in the same setup ZGC does 0.2ms to 1.5 ms operations depending on which
>> operation it is. The time it takes to stop and start the JavaThreads
>> is relative
>> very large to a ZGC safepoint. With an operation that just takes
>> 0.2ms the
>> overhead of stopping and starting JavaThreads is several times the
>> operation.
>>
>> High-level functionality change:
>> Serializing the starting over Threads_lock takes time.
>> - Don't wait on Threads_lock use the WaitBarrier.
>> Serializing the stopping over Safepoint_lock takes time.
>> - Let threads stop in parallel, remove Safepoint_lock.
>>
>> Details:
>> JavaThreads have 2 abstract logical states: unsafe or safe.
>> - Safe means the JavaThread will not touch Java heap or VM internal
>> structures
>> without doing a transition and block before doing so.
>> - The safe states are:
>> - When polls armed: _thread_in_native and
>> _thread_blocked.
>> - When Threads_lock is held: externally suspended
>> flag is set.
>> - VM Thread have polls armed and holds the Threads_lock
>> during a
>> safepoint.
>> - Unsafe means that either Java heap or VM internal structures can be
>> accessed
>> by the JavaThread, e.g., _thread_in_Java, _thread_in_vm.
>> - All combination that are not safe are unsafe.
>>
>> We cannot start a safepoint until all unsafe threads have
>> transitioned to a safe
>> state. To make them safe, we arm polls in compiled code and make sure
>> any
>> transition to another unsafe state will be blocked. JavaThreads which
>> are unsafe
>> with state _thread_in_Java may transition to _thread_in_native
>> without being
>> blocked, since it just became a safe thread and we can proceed. Any
>> safe thread
>> may try to transition at any time to an unsafe state, thus coming
>> into the
>> safepoint blocking code at any moment, e.g., after the safepoint is
>> over, or
>> even at the beginning of next safepoint.
>>
>> The VMThread cannot tolerate false positives from the JavaThread
>> thread state
>> because that would mean starting the safepoint without all
>> JavaThreads being
>> safe. The two locks (Threads_lock and Safepoint_lock) make sure we
>> never observe
>> false positives from the safepoint blocking code, if we remove them,
>> how do we
>> handle false positives?
>>
>> By first publishing which barrier tag (safepoint counter) we will call
>> WaitBarrier.wait() with as the threads safepoint id and then change
>> the state to
>> _thread_blocked, the VMThread can ignore JavaThreads by doing a
>> stable load of
>> the state. A stable load of the thread state is successful if the thread
>> safepoint id is the same both before and after the load of the state and
>> safepoint id is current or InactiveSafepointCounter. If the stable
>> load fails,
>> the thread is considered safepoint unsafe. It's no longer enough that
>> thread is
>> have state _thread_blocked it must also have correct safepoint id
>> before and
>> after we read the state.
>>
>> Performance:
>> The result of faster safepoints is that the average CPU time for
>> JavaThreads
>> between safepoints is higher, thus increasing the allocation rate.
>> The thread
>> that stops first waits shorter time until it gets started. Even the
>> thread that
>> stops last also have shorter stop since we start them faster. If your
>> application is using a concurrent GC it may need re-tunning since
>> each java
>> worker thread have an increased CPU time/allocation rate. Often this
>> means max
>> performance is achieved using slightly less java worker threads than
>> before.
>> Also the increase allocation rate means shorter time between GC
>> safepoints.
>> - If you are using a non-concurrent GC, you should see improved
>> latency and
>> throughput.
>> - After re-tunning with a concurrent GC throughput should be equal or
>> better but
>> with better latency. But bear in mind this is a latency patch, not a
>> throughput one.
>> With current code a java thread is not to guarantee to run between
>> safepoint (in
>> theory a java thread can be starved indefinitely), since the VM
>> thread may
>> re-grab the Threads_locks before it woke up from previous safepoint.
>> If the
>> GC/VM don't respect MMU (minimum mutator utilization) or if your
>> machine is very
>> over-provisioned this can happen.
>> The current schema thus re-safepoint quickly if the java threads have
>> not
>> started yet at the cost of latency. Since the new code uses the
>> WaitBarrier with
>> the safepoint counter, all threads must roll forward to next
>> safepoint by
>> getting at least some CPU time between two safepoints. Meaning MMU
>> violations
>> are more obvious.
>>
>> Some examples on numbers:
>> - On a 16 strand machine synchronization and
>> un-synchronization/starting is at
>> least 3x faster (in non-trivial test). Synchronization ~600 ->
>> ~100us and
>> starting ~400->~100us.
>> (Semaphore path is a bit slower than futex in the WaitBarrier on
>> Linux).
>> - SPECjvm2008 serial (untuned G1) gives 10x (1 ms vs 100 us) faster
>> synchronization time on 16 strands and ~5% score increase. In this
>> case the GC
>> op is 1ms, so we reduce the overhead of synchronization from 100%
>> to 10%.
>> - specJBB2015 ParGC ~9% increase in critical-jops.
>>
>> Thanks, Robbin
More information about the hotspot-dev
mailing list