RFR: 8266557: assert(SafepointMechanism::local_poll_armed(_handshakee)) failed: Must be [v3]

Daniel D.Daugherty dcubed at openjdk.java.net
Tue Jun 1 21:30:38 UTC 2021


On Fri, 28 May 2021 07:45:33 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Please consider this change-set which address the issue on hand.
>> 
>> I identified two problems:
>> 
>> - is_locked() uses the _owner field which is unordered (no storestore|storeload) on store-side.
>> Fixed by leaving the handshakee being processed in queue until completed.
>> And remove looping, since if ever the queue is empty the handshakee may processed.
>> If ever want to loop again, we must make sure queue is not empty before removing the processed handshake.
>> But there is, at this moment, no performance benefit to that, so I chosse the simple, easy to reason about version. (some crazy stress test can see a difference)
>> 
>> Note that I'll do a follow-up and make is_locked() ifdef ASSERT only.
>> 
>> - have_non_self_executable_operation() only provide correct acquire if first in queue matched, if second item matched it could be re-orderd with reading the poll state.
>> Fixed by adding a loadload.
>> 
>> I could at first reproduce by checking _active_handshaker in update_poll (~1/50) and an increase in the test time by ten.
>> (real reprod ~1/400 with increased test time)
>> If we are updating the poll there should not be an active handshaker.
>> The above fixed the issue.
>> But after a rebase when I was trying to pin point the issue I could no longer reproduce even without any changes.
>> 
>> Added Atomic store/load to _active_handshaker since it may be concurrently loaded (it may only be used to ask if current thread is active handshaker).
>> 
>> Passes stressing relevant test on aarch64 and t1-7.
>> 
>> Thanks, Robbin
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into handshakee
>  - Small update
>  - Merge branch 'master' into handshakee
>  - Fix

Changes requested by dcubed (Reviewer).

I've stopped my review because I realized that I have gotten confused by
the `peek()` operation that actually appears to be searching for a match
rather than peeking at the head (or tail) of the queue. My brain is no longer
clear on what your operations are. Sorry.

src/hotspot/share/runtime/handshake.cpp line 429:

> 427:   assert(_handshakee == Thread::current(), "Must be called by self");
> 428:   assert(_lock.owned_by_self(), "Lock must be held");
> 429:   return _queue.peek();

`get_op_for_self()` name does not match the `peek()` operation.
The new name should be `peek_op_for_self()`.

src/hotspot/share/runtime/handshake.cpp line 445:

> 443:   assert(_handshakee != Thread::current(), "Must not be called by self");
> 444:   assert(_lock.owned_by_self(), "Lock must be held");
> 445:   return _queue.peek(non_self_queue_filter);

`get_op()` name does not match the peek() operation.
The new name should be `peek_op()`.

src/hotspot/share/runtime/handshake.cpp line 452:

> 450:   MatchOp mo(op);
> 451:   HandshakeOperation* ret = _queue.pop(mo);
> 452:   assert(ret == op, "OP missing from queue");

Perhaps this for the assert() mesg:
`  assert(ret == op, "popped op must match requested op.");

Also, I would name this one `pop_op()` instead of `remove_op()`.

src/hotspot/share/runtime/handshake.cpp line 483:

> 481:         HandleMark hm(_handshakee);
> 482:         PreserveExceptionMark pem(_handshakee);
> 483:         op->do_handshake(_handshakee); // acquire, op removed after

Why `acquire`? Perhaps:
`// peek and handshake with op; op removed after`

src/hotspot/share/runtime/handshake.cpp line 489:

> 487:         // The destructor ~PreserveExceptionMark touches the exception oop so it must not be executed,
> 488:         // since a safepoint may be in-progress when returning from the async handshake.
> 489:         op->do_handshake(_handshakee); // acquire, op removed after

Why acquire? Perhaps:
// peek and handshake with op; op removed after

src/hotspot/share/runtime/handshake.cpp line 583:

> 581: 
> 582:   set_active_handshaker(current_thread);
> 583:   op->do_handshake(_handshakee); // acquire, op removed after

Why acquire? Perhaps:
// peek and handshake with op; op removed after

src/hotspot/share/runtime/handshake.hpp line 107:

> 105:   void remove_op(HandshakeOperation* op);
> 106: 
> 107:   void set_active_handshaker(Thread* thread) { Atomic::store(&_active_handshaker, thread); }

Should:
```Thread* _active_handshaker;```
on L89 above be:
```Thread* volatile _active_handshaker;```

src/hotspot/share/utilities/filterQueue.inline.hpp line 127:

> 125:   do {
> 126:     if (match_func(cur->_data)) {
> 127:       match = cur;

Hmmm.... I was expecting a break after this match.
Is there a reason to continue the loop?

But now I'm realizing that I'm confused since this is called `peek()`, but
it's not really peeking at the head of the queue. It is searching the entire
queue for a match.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3973


More information about the hotspot-runtime-dev mailing list