[jdk16] RFR: 8257999: Parallel GC crash in gc/parallel/TestDynShrinkHeap.java: new region is not in covered_region

Kim Barrett kbarrett at openjdk.java.net
Wed Dec 16 14:11:10 UTC 2020


Please review this change to ParallelGC oldgen allocation, adding a missing
memory barrier.

The problem arises in the interaction between concurrent oldgen allocations,
where each would, if done serially (in either order), require expansion of
the generation.

An allocation of size N compares the mutable space's (end - top) with N to
determine if space is available.  If available, use top as the start of the
object of size N (adjusting top atomically) and assert the resulting memory
region is in the covered area.  If not, then expand.

Expansion updates the covered region, then updates the space (i.e. end).
There is currently no memory barrier between those operations.

As a result, we can have thread1 having done an expansion, updating the
covered region and the space end. Because there's no memory barrier there,
the space end may be updated before the covered region as far as some other
thread is concerned.

Meanwhile thread2's allocation reads the new end and goes ahead with the
allocation (which would not have fit with the old end value), then fails the
covered region check because it used the old covered range.  Although the
reads of end and the covered range are ordered here by the intervening CAS
of top, that doesn't help if the writes by thread1 are not also properly
ordered.

There is even a comment about this in PSOldGen::post_resize(), saying the
space update must be last (including after the covered region update).  But
without a memory barrier, there's nothing other than source order to ensure
that ordering.  So add a memory barrier.

I'm not sure whether this out-of-order update of the space end could lead to  
problems in a product build (where the assert doesn't apply).  Without
looking carefully, there appear to be opportunities for problems, such as
accessing uncovered parts of the card table.
 
There's another issue that I'm not addressing with this change.  Various
values are being read while subject to concurrent writes, without being in
any way tagged as atomic. (The writes are under the ExpandHeap_lock, the
reads are not.) This includes at least the covering region bounds and space
end.

Testing:
mach5 tier1
I was unable to reproduce the failure, so can't show any before / after
improvement.

-------------

Commit messages:
 - add memory barrier

Changes: https://git.openjdk.java.net/jdk16/pull/35/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=35&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8257999
  Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk16/pull/35.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/35/head:pull/35

PR: https://git.openjdk.java.net/jdk16/pull/35



More information about the hotspot-gc-dev mailing list