[jdk18] RFR: 8273383: vmTestbase/vm/gc/containers/Combination05/TestDescription.java crashes verifying length of DCQS [v2]
Kim Barrett
kbarrett at openjdk.java.net
Wed Jan 19 04:40:19 UTC 2022
> Please review this fix of an ABA problem in G1DirtyCardQueueSet's use of
> NonblockingQueue.
>
> Consider queue q contains one entry, `X`.
>
> Thread1 begins a push/append operation. WOLG, assume a push of `Y`. It
> atomically exchanges `q._tail` with `Y`, retaining the old value of `q._tail`,
> which is `X`. Pause Thread1 here.
>
> Thread2 pops `X` from the queue. It successfully changes `X._next` from
> end-marker to null and `q._head` from `X` to null. Thread2 fails to change
> `q._tail` from `X` to null, as Thread1 already changed it to `Y`.
>
> Various threads add entries to the queue after `Y`, and various threads try to
> pop entries and find the queue "empty" because `q._head` will remain null
> until Thread1 proceeds.
>
> In parallel with entries being added, Thread2 processes `X` and releases it
> for reuse. Some thread reallocates it, fills it, and pushes it onto the
> queue. We now have a chain from `Y` => `X` in `q._tail` with `q._head` still
> null.
>
> If Thread1 resumes at that point, it finds `X._next` == end-marker,
> successfully changes it to `Y`, and "completes". We now have `q._head` == null
> and `q._tail` == `X`, with `X` => `Y` => `X`. The next push/append will
> observe `q._tail` of `X` having a non-end-marker `_next` and change both
> `q._head` and `q._tail`, dropping the list containing `X`, `Y`, &etc on the
> floor.
>
> There may be other similarly bad scenarios, depending on exactly what is going
> on when Thread1 resumes.
>
> All of these are fixed by performing the push operation in a critical section,
> so that `X` cannot be reused until the push that is observing it
> completes. That is, in `DirtyCardQueueSet::enqueue_completed_buffer`, put
> `_completed.push(*cbn);` in a GlobalCounter critical section.
>
> Testing:
> mach5 tier1
>
> Running the failing test with -XX:G1UpdateBufferSize=3 was failing at a rate
> of about 1/1K runs of the test. With this change, there were no failures in
> more than 60K runs.
Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
- Merge branch 'master' into fix-dcqs-push
- DCQS push buffers inside CS
-------------
Changes:
- all: https://git.openjdk.java.net/jdk18/pull/104/files
- new: https://git.openjdk.java.net/jdk18/pull/104/files/9fd316f5..27e24d24
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk18&pr=104&range=01
- incr: https://webrevs.openjdk.java.net/?repo=jdk18&pr=104&range=00-01
Stats: 1502 lines in 58 files changed: 743 ins; 581 del; 178 mod
Patch: https://git.openjdk.java.net/jdk18/pull/104.diff
Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/104/head:pull/104
PR: https://git.openjdk.java.net/jdk18/pull/104
More information about the hotspot-gc-dev
mailing list