RFR: 8266557: assert(SafepointMechanism::local_poll_armed(_handshakee)) failed: Must be
David Holmes
dholmes at openjdk.java.net
Mon May 24 04:50:55 UTC 2021
On Tue, 11 May 2021 11:59:23 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
> Please consider this change-set which address the issue on hand.
>
> I identified two problems:
>
> - is_locked() uses the _owner field which is unordered (no storestore|storeload) on store-side.
> Fixed by leaving the handshakee being processed in queue until completed.
> And remove looping, since if ever the queue is empty the handshakee may processed.
> If ever want to loop again, we must make sure queue is not empty before removing the processed handshake.
> But there is, at this moment, no performance benefit to that, so I chosse the simple, easy to reason about version. (some crazy stress test can see a difference)
>
> Note that I'll do a follow-up and make is_locked() ifdef ASSERT only.
>
> - have_non_self_executable_operation() only provide correct acquire if first in queue matched, if second item matched it could be re-orderd with reading the poll state.
> Fixed by adding a loadload.
>
> I could at first reproduce by checking _active_handshaker in update_poll (~1/50) and an increase in the test time by ten.
> (real reprod ~1/400 with increased test time)
> If we are updating the poll there should not be an active handshaker.
> The above fixed the issue.
> But after a rebase when I was trying to pin point the issue I could no longer reproduce even without any changes.
>
> Added Atomic store/load to _active_handshaker since it may be concurrently loaded (it may only be used to ask if current thread is active handshaker).
>
> Passes stressing relevant test on aarch64 and t1-7.
>
> Thanks, Robbin
Hi Robbin,
I'm afraid these changes are very unclear to me. Other than not using is_locked() to determine if there is more work to do I really don't follow how the other changes relate to the failure mode.
Thanks,
David
src/hotspot/share/runtime/handshake.cpp line 330:
> 328: // closure are visible to the VMThread/Handshaker after it reads
> 329: // that the operation has completed.
> 330: Atomic::dec(&_pending_threads, memory_order_acq_rel);
Should this just default to the full two-way barrier for atomic r-m-w operations?
src/hotspot/share/runtime/handshake.cpp line 473:
> 471: while (has_operation()) {
> 472: MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
> 473: HandshakeOperation* op = get_op_for_self();
I still don't understand why we now do a peek here, and then do the pop below (in remove-op), instead of just doing the pop here?
src/hotspot/share/runtime/handshake.cpp line 570:
> 568: Thread* current_thread = Thread::current();
> 569:
> 570: HandshakeOperation* op = get_op();
Again why not pop here?
src/hotspot/share/runtime/handshake.cpp line 592:
> 590: executed++;
> 591: }
> 592: } while (have_non_self_executable_operation());
Not at all clear how/why this looping can now be removed.
src/hotspot/share/runtime/handshake.cpp line 594:
> 592: op == match_op ? "including" : "excluding", p2i(match_op));
> 593:
> 594: return op == match_op ? HandshakeState::_succeeded : HandshakeState::_processed;
Pre-existing: what does _processed mean versus _succeeded?
src/hotspot/share/runtime/safepointMechanism.cpp line 116:
> 114:
> 115: void SafepointMechanism::update_poll_values(JavaThread* thread) {
> 116: assert(thread == Thread::current(), "Must not be");
I assume the message should say "Must be".
-------------
PR: https://git.openjdk.java.net/jdk/pull/3973
More information about the hotspot-runtime-dev
mailing list