RFR(XL): 8203469: Faster safepoints
Robbin Ehn
robbin.ehn at oracle.com
Mon Feb 11 09:29:17 UTC 2019
Hi Dan,
>
> This webrev's patch file is again a couple of patches together which
> makes searches of the patch for patterns problematic. I still managed
> to find an errant use of _call_back in a comment, but it wasn't easy.
>
There is a bug webrev that makes it do on changeset per commit and put all those
into open.changeset.
I have baseline, v01 and v02 applied it does this, since I need to be able to
change both v01 and v02. If you apply the open.changeset it will become a flat
diff. I'll try to work around this bug for next update.
> src/hotspot/share/runtime/interfaceSupport.inline.hpp
> L317: thread->set_thread_state((JavaThreadState)(_thread_in_vm + 1));
> L318: InterfaceSupport::serialize_thread_state_with_handler(thread);
> L319:
> old L320: SafepointMechanism::callback_if_safepoint(thread);
> L320: thread->set_thread_state(_thread_blocked);
> I'm having trouble with dropping callback_if_safepoint() here,
> but I can't put my finger on it yet. We're taking out these
> lines of code:
>
> L66: if (!uses_thread_local_poll() || local_poll_armed(thread)) {
> L73: if (global_poll()) {
> L74: SafepointSynchronize::block(thread, false);
>
> which my brain first read as taking out blocking for a safepoint,
> but then I remembered the 'false' parameter means we don't actually
> block for the safepoint. Okay, I'm glad that parameter is now gone
> from block(). So now it's much clearer that we are not blocking for
> a safepoint in the ThreadBlockInVMWithDeadlockCheck ctr. This is
> new behavior with Patricio's remove sneaky locking code and is not
> new with faster safepoints.
Yea.
>
> src/hotspot/share/runtime/safepoint.hpp
> L115: // Back off strategy when synchronizing a safepoint:
> L116: static void back_off();
> This function doesn't exist.
Thanks, removed.
>
> src/hotspot/share/runtime/safepoint.cpp
> Is this the code you intended to put in a back_off() function?
>
> L273: if (still_running > 0) {
> L274: // iterations will be 2 the first time we enter this spin
> back-off.
> L275: // naked_short_nanosleep takes tenths of micros which means that
> L276: // number of nanoseconds is irrelevant if it's below that. We do
> L277: // 20 1 ns sleeps with a total cost of ~1 ms, then we do 1 ms
> sleeps.
> L278: jlong sleep_ns = 1;
> L279: if (iterations > 21) {
> L280: sleep_ns = NANOUNITS / MILLIUNITS; // 1 ms
> L281: }
> L282: os::naked_short_nanosleep(sleep_ns);
> L283: }
>
> Refactoring it in back_off() would clean things up a bit.
In earlier versions it was platform specific, but now it's not.
Sure, added static void back_off(int iterations) as a static C function above.
>
>
> L1082: // Check state. block() will set thread state to thread_in_vm
> which will
> L1083: // cause the safepoint state _type to become _call_back.
> L1084: assert(!SafepointMechanism::uses_global_page_poll() ||
> !_safepoint_safe,
> L1085: "polling page exception on thread safepoint safe");
> This comment is now stale, but I'm not sure how it should be
> rewritten. The original comment doesn't really match the
> original assert() so I wonder if it went stale before this
> fix... Perhaps the comment should be:
>
> // If we're using a global poll, then the thread should not be
> // marked as safepoint safe yet.
Yes, thanks.
>
> There are pluses and minuses to the JavaThreads no longer calling back:
>
> pluses:
> - It was a bit confusing when a JavaThread could call back for itself
> or the VMThread could call back for a JavaThread in other cases.
> - Now, only the VMThread handles setting _safepoint_safe which is the
> replacement for call back. So you want some asserts in there?
> minuses:
> - It feels like we've lost some debugging state with the removal of
> call back, but maybe that's not right. I guess we couldn't distinguish
> _who_ called back before (JavaThread for self or VMThread).
Added assert for _safepoint_safe.
>
> I'm not entirely convinced that we have all of our bases covered for a
> thread blocking at a safepoint, but we have that worry with Patricio's
> removal of sneaky locking and we're going to shake that out with some
> targeted stress testing.
>
> Robbin, you mentioned stress test above. What kind of stress testing did
> you run and for how long?
Multiple instances of UnexpectedDeoptimizationTest combined with instances of
handshakes tests, an hour.
That combination caught the bug with just looking at global and not checking
local poll armed first in sneaky locking. So it have pretty good coverage.
I'll start do more stress test on next revision.
>
> I think I'm good with where v06_2 is going. Of course, I'll mull on it
> over the weekend and I may have more comments next week.
Great, thanks, Robbin
>
> Dan
>
More information about the hotspot-dev
mailing list