RFR: 8266557: assert(SafepointMechanism::local_poll_armed(_handshakee)) failed: Must be
Robbin Ehn
rehn at openjdk.java.net
Tue May 18 08:25:41 UTC 2021
On Tue, 11 May 2021 11:59:23 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
> Please consider this change-set which address the issue on hand.
>
> I identified two problems:
>
> - is_locked() uses the _owner field which is unordered (no storestore|storeload) on store-side.
> Fixed by leaving the handshakee being processed in queue until completed.
> And remove looping, since if ever the queue is empty the handshakee may processed.
> If ever want to loop again, we must make sure queue is not empty before removing the processed handshake.
> But there is, at this moment, no performance benefit to that, so I chosse the simple, easy to reason about version. (some crazy stress test can see a difference)
>
> Note that I'll do a follow-up and make is_locked() ifdef ASSERT only.
>
> - have_non_self_executable_operation() only provide correct acquire if first in queue matched, if second item matched it could be re-orderd with reading the poll state.
> Fixed by adding a loadload.
>
> I could at first reproduce by checking _active_handshaker in update_poll (~1/50) and an increase in the test time by ten.
> (real reprod ~1/400 with increased test time)
> If we are updating the poll there should not be an active handshaker.
> The above fixed the issue.
> But after a rebase when I was trying to pin point the issue I could no longer reproduce even without any changes.
>
> Added Atomic store/load to _active_handshaker since it may be concurrently loaded (it may only be used to ask if current thread is active handshaker).
>
> Passes stressing relevant test on aarch64 and t1-7.
>
> Thanks, Robbin
> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-runtime-dev](mailto:hotspot-runtime-dev at mail.openjdk.java.net):_
>
> Hi Robbin,
>
> On 17/05/2021 11:02 pm, Robbin Ehn wrote:
>
> > Please consider this change-set which address the issue on hand.
> > I identified two problems:
> > - is_locked() uses the _owner field which is unordered (no storestore|storeload) on store-side.
> > Fixed by leaving the handshakee being processed in queue until completed.
>
> Sorry but I'm not seeing how leaving or removing the op from the queue
> impacts things here. Which thread is reading the queue and seeing a
> wrong value?
When poll is armed the handshakee must go to slow path and lock the HandshakeState lock if:
- 1: There is handshake operation to be processed
- 2: Some-else is in the middle of processing a handshake operation for this handshakee
By having having a non-empty queue if 1 or 2 is true, the handshakee can just ask if there is a non-empty queue.
If so go to slow path and grab the HandshakeState lock.
Now 1 is checked with queue head and 2 is checked with is_locked().
But since the thread locking, storing _owner, can do this unordered, it may store the owner field after it have read our thread state, e.g. blocked.
What we do in three different scenarios are:
Handshaker adds an operation:
store queue
store poll
Handshaker processing do:
store _owner (lock)
load queue
load poll
load thread state
Handshakee polling path do:
store state
load poll
load owner
load queue
############################################
With the un-ordered characteristics of _owner field, it looks like it could play out like this:
Handshakee: state blocked
Handshaker A processor: add: store queue OP Z
Handshaker B processor: processes OP Z and store queue => NULL
Handshaker A processor: add: store poll
Handshaker A processor: LOCK HandshakeState lock (_owner is still NULL)
Handshaker C processor: add: store queue OP X
Handshaker A processor: load queue (sees X)
Handshaker A processor: load poll (armed)
Handshaker A processor: load thread state // OK blocked
Handshakee T target : store state // unsafe
Handshakee T target : load poll // armed
Handshakee T target : load owner // NULL
Handshaker A processor: store _owner (non-NULL) // re-ordered here !!
Handshaker A processor: store queue (NULL)
Handshakee T target : load queue // NULL
Handshakee T target : disarm // ERROR !!!!
>
> Thanks,
> David
> -----
-------------
PR: https://git.openjdk.java.net/jdk/pull/3973
More information about the hotspot-runtime-dev
mailing list