[jdk18] RFR: 8273383: vmTestbase/vm/gc/containers/Combination05/TestDescription.java crashes verifying length of DCQS

Kim Barrett kbarrett at openjdk.java.net
Sun Jan 16 18:25:57 UTC 2022


Please review this fix of an ABA problem in G1DirtyCardQueueSet's use of
NonblockingQueue. 

Consider queue q contains one entry, `X`.

Thread1 begins a push/append operation. WOLG, assume a push of `Y`. It
atomically exchanges `q._tail` with `Y`, retaining the old value of `q._tail`,
which is `X`. Pause Thread1 here.

Thread2 pops `X` from the queue. It successfully changes `X._next` from
end-marker to null and `q._head` from `X` to null. Thread2 fails to change
`q._tail` from `X` to null, as Thread1 already changed it to `Y`.

Various threads add entries to the queue after `Y`, and various threads try to
pop entries and find the queue "empty" because `q._head` will remain null
until Thread1 proceeds.

In parallel with entries being added, Thread2 processes `X` and releases it
for reuse. Some thread reallocates it, fills it, and pushes it onto the
queue. We now have a chain from `Y` => `X` in `q._tail` with `q._head` still
null.

If Thread1 resumes at that point, it finds `X._next` == end-marker,
successfully changes it to `Y`, and "completes". We now have `q._head` == null
and `q._tail` == `X`, with `X` => `Y` => `X`. The next push/append will
observe `q._tail` of `X` having a non-end-marker `_next` and change both
`q._head` and `q._tail`, dropping the list containing `X`, `Y`, &etc on the
floor.

There may be other similarly bad scenarios, depending on exactly what is going
on when Thread1 resumes.

All of these are fixed by performing the push operation in a critical section,
so that `X` cannot be reused until the push that is observing it
completes. That is, in `DirtyCardQueueSet::enqueue_completed_buffer`, put
`_completed.push(*cbn);` in a GlobalCounter critical section.

Testing:
mach5 tier1

Running the failing test with -XX:G1UpdateBufferSize=3 was failing at a rate
of about 1/1K runs of the test.  With this change, there were no failures in
more than 60K runs.

-------------

Commit messages:
 - DCQS push buffers inside CS

Changes: https://git.openjdk.java.net/jdk18/pull/104/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=104&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8273383
  Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk18/pull/104.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/104/head:pull/104

PR: https://git.openjdk.java.net/jdk18/pull/104



More information about the hotspot-gc-dev mailing list