RFR(XL): 8203469: Faster safepoints

Sat Feb 9 02:42:04 UTC 2019

On 2/7/19 11:05 AM, Robbin Ehn wrote:
> But there is a 'better' way.
> Before I added the more graceful "_vm_wait->wait();" semaphore in the 
> while
> (_waiting_to_block > 0) { loop, it was a just a busy spin using the same
> back-off as the rolling forward loop. It turns out that we mostly 
> never spin
> here at all, when all java threads are stop the callbacks is often 
> already done.
> So the addition of the semaphore have no impact on our benchmarks and 
> is mostly
> unused. This is because most threads are in java which we need to 
> spin-wait
> since they can elide into native without doing a callback. My proposed 
> re-base
> removes the the callbacks completely and let the vm thread do all thread
> accounting. All that the stopping threads needs to do is write state and
> safepoint id, everything else is handle by the vm thread. We trade 2 
> atomics +
> a local store per thread against doing 2 stores per thread by the vm 
> thread.
> This makes it possible for a thread in vm to transition into blocked 
> WITHOUT
> safepoint poll. Just set thread_blocked and promise to do safepoint 
> poll when
> leaving that state.
>
> v06_2
> Full:
> http://cr.openjdk.java.net/~rehn/8203469/v06_2/full/

This webrev's patch file is again a couple of patches together which
makes searches of the patch for patterns problematic. I still managed
to find an errant use of _call_back in a comment, but it wasn't easy.

> Inc against v05:
> http://cr.openjdk.java.net/~rehn/8203469/v06_2/inc/
> Inc against v06_1:
> http://cr.openjdk.java.net/~rehn/8203469/v06_2/rebase_inc/
>
> v06_2 simplifies and removes ~200 LOC with same performance.
> If there is a case with a thread in vm taking long time, it will already
> screw-up latency and thus should be fixed regardless of v06_1 vs 
> v06_2. So I
> see no reason why we should not push v06_2.
>
> Passes stress test and t1-5.
>
> Thanks, Robbin

src/hotspot/share/code/dependencyContext.hpp
     No comments.

src/hotspot/share/jfr/recorder/repository/jfrEmergencyDump.cpp
     No comments.

src/hotspot/share/jfr/recorder/stacktrace/jfrStackTraceRepository.cpp
     No comments.

src/hotspot/share/runtime/handshake.cpp
     No comments.

src/hotspot/share/runtime/interfaceSupport.inline.hpp
     L317: thread->set_thread_state((JavaThreadState)(_thread_in_vm + 1));
     L318: InterfaceSupport::serialize_thread_state_with_handler(thread);
     L319:
     old L320: SafepointMechanism::callback_if_safepoint(thread);
     L320:     thread->set_thread_state(_thread_blocked);
         I'm having trouble with dropping callback_if_safepoint() here,
         but I can't put my finger on it yet. We're taking out these
         lines of code:

         L66:   if (!uses_thread_local_poll() || local_poll_armed(thread)) {
         L73:     if (global_poll()) {
         L74:       SafepointSynchronize::block(thread, false);

         which my brain first read as taking out blocking for a safepoint,
         but then I remembered the 'false' parameter means we don't actually
         block for the safepoint. Okay, I'm glad that parameter is now gone
         from block(). So now it's much clearer that we are not blocking for
         a safepoint in the ThreadBlockInVMWithDeadlockCheck ctr. This is
         new behavior with Patricio's remove sneaky locking code and is not
         new with faster safepoints.

src/hotspot/share/runtime/mutex.cpp
     No comments.

src/hotspot/share/runtime/mutex.hpp
     No comments.

src/hotspot/share/runtime/mutexLocker.cpp
     No comments.

src/hotspot/share/runtime/mutexLocker.hpp
     No comments.

src/hotspot/share/runtime/safepointMechanism.cpp
     No comments.

src/hotspot/share/runtime/safepointMechanism.hpp
     No comments.

src/hotspot/share/runtime/safepointMechanism.inline.hpp
     No comments.

src/hotspot/share/runtime/thread.hpp
     No comments.

src/hotspot/share/runtime/vmThread.cpp
     No comments.

src/hotspot/share/services/runtimeService.cpp
     No comments.

src/hotspot/share/services/runtimeService.hpp
     No comments.

test/hotspot/jtreg/runtime/logging/SafepointTest.java
     No comments.

src/hotspot/share/runtime/safepoint.hpp
     L115:   // Back off strategy when synchronizing a safepoint:
     L116:   static void back_off();
         This function doesn't exist.

src/hotspot/share/runtime/safepoint.cpp
     Is this the code you intended to put in a back_off() function?

     L273:     if (still_running > 0) {
     L274:       // iterations will be 2 the first time we enter this 
spin back-off.
     L275:       // naked_short_nanosleep takes tenths of micros which 
means that
     L276:       // number of nanoseconds is irrelevant if it's below 
that. We do
     L277:       // 20 1 ns sleeps with a total cost of ~1 ms, then we 
do 1 ms sleeps.
     L278:       jlong sleep_ns = 1;
     L279:       if (iterations > 21) {
     L280:         sleep_ns = NANOUNITS / MILLIUNITS;  // 1 ms
     L281:       }
     L282:       os::naked_short_nanosleep(sleep_ns);
     L283:     }

     Refactoring it in back_off() would clean things up a bit.

     L1082:   // Check state.  block() will set thread state to 
thread_in_vm which will
     L1083:   // cause the safepoint state _type to become _call_back.
     L1084:   assert(!SafepointMechanism::uses_global_page_poll() || 
!_safepoint_safe,
     L1085:          "polling page exception on thread safepoint safe");
         This comment is now stale, but I'm not sure how it should be
         rewritten. The original comment doesn't really match the
         original assert() so I wonder if it went stale before this
         fix... Perhaps the comment should be:

              // If we're using a global poll, then the thread should not be
              // marked as safepoint safe yet.

     There are pluses and minuses to the JavaThreads no longer calling back:

     pluses:
       - It was a bit confusing when a JavaThread could call back for itself
         or the VMThread could call back for a JavaThread in other cases.
       - Now, only the VMThread handles setting _safepoint_safe which is the
         replacement for call back. So you want some asserts in there?
     minuses:
       - It feels like we've lost some debugging state with the removal of
         call back, but maybe that's not right. I guess we couldn't 
distinguish
         _who_ called back before (JavaThread for self or VMThread).

I'm not entirely convinced that we have all of our bases covered for a
thread blocking at a safepoint, but we have that worry with Patricio's
removal of sneaky locking and we're going to shake that out with some
targeted stress testing.

Robbin, you mentioned stress test above. What kind of stress testing did
you run and for how long?

I think I'm good with where v06_2 is going. Of course, I'll mull on it
over the weekend and I may have more comments next week.

Dan