[jdk18] Integrated: 8273383: vmTestbase/vm/gc/containers/Combination05/TestDescription.java crashes verifying length of DCQS
Kim Barrett
kbarrett at openjdk.java.net
Wed Jan 19 04:40:19 UTC 2022
On Sun, 16 Jan 2022 18:20:07 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
> Please review this fix of an ABA problem in G1DirtyCardQueueSet's use of
> NonblockingQueue.
>
> Consider queue q contains one entry, `X`.
>
> Thread1 begins a push/append operation. WOLG, assume a push of `Y`. It
> atomically exchanges `q._tail` with `Y`, retaining the old value of `q._tail`,
> which is `X`. Pause Thread1 here.
>
> Thread2 pops `X` from the queue. It successfully changes `X._next` from
> end-marker to null and `q._head` from `X` to null. Thread2 fails to change
> `q._tail` from `X` to null, as Thread1 already changed it to `Y`.
>
> Various threads add entries to the queue after `Y`, and various threads try to
> pop entries and find the queue "empty" because `q._head` will remain null
> until Thread1 proceeds.
>
> In parallel with entries being added, Thread2 processes `X` and releases it
> for reuse. Some thread reallocates it, fills it, and pushes it onto the
> queue. We now have a chain from `Y` => `X` in `q._tail` with `q._head` still
> null.
>
> If Thread1 resumes at that point, it finds `X._next` == end-marker,
> successfully changes it to `Y`, and "completes". We now have `q._head` == null
> and `q._tail` == `X`, with `X` => `Y` => `X`. The next push/append will
> observe `q._tail` of `X` having a non-end-marker `_next` and change both
> `q._head` and `q._tail`, dropping the list containing `X`, `Y`, &etc on the
> floor.
>
> There may be other similarly bad scenarios, depending on exactly what is going
> on when Thread1 resumes.
>
> All of these are fixed by performing the push operation in a critical section,
> so that `X` cannot be reused until the push that is observing it
> completes. That is, in `DirtyCardQueueSet::enqueue_completed_buffer`, put
> `_completed.push(*cbn);` in a GlobalCounter critical section.
>
> Testing:
> mach5 tier1
>
> Running the failing test with -XX:G1UpdateBufferSize=3 was failing at a rate
> of about 1/1K runs of the test. With this change, there were no failures in
> more than 60K runs.
This pull request has now been integrated.
Changeset: 69cfa9cb
Author: Kim Barrett <kbarrett at openjdk.org>
URL: https://git.openjdk.java.net/jdk18/commit/69cfa9cb36ab2b5490c231c30306f682665faab4
Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod
8273383: vmTestbase/vm/gc/containers/Combination05/TestDescription.java crashes verifying length of DCQS
Reviewed-by: tschatzl, sjohanss
-------------
PR: https://git.openjdk.java.net/jdk18/pull/104
More information about the hotspot-gc-dev
mailing list