[jdk16] RFR: 8257999: Parallel GC crash in gc/parallel/TestDynShrinkHeap.java: new region is not in covered_region

Stefan Johansson sjohanss at openjdk.java.net
Thu Dec 17 08:47:57 UTC 2020


On Wed, 16 Dec 2020 14:05:13 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this change to ParallelGC oldgen allocation, adding a missing
> memory barrier.
> 
> The problem arises in the interaction between concurrent oldgen allocations,
> where each would, if done serially (in either order), require expansion of
> the generation.
> 
> An allocation of size N compares the mutable space's (end - top) with N to
> determine if space is available.  If available, use top as the start of the
> object of size N (adjusting top atomically) and assert the resulting memory
> region is in the covered area.  If not, then expand.
> 
> Expansion updates the covered region, then updates the space (i.e. end).
> There is currently no memory barrier between those operations.
> 
> As a result, we can have thread1 having done an expansion, updating the
> covered region and the space end. Because there's no memory barrier there,
> the space end may be updated before the covered region as far as some other
> thread is concerned.
> 
> Meanwhile thread2's allocation reads the new end and goes ahead with the
> allocation (which would not have fit with the old end value), then fails the
> covered region check because it used the old covered range.  Although the
> reads of end and the covered range are ordered here by the intervening CAS
> of top, that doesn't help if the writes by thread1 are not also properly
> ordered.
> 
> There is even a comment about this in PSOldGen::post_resize(), saying the
> space update must be last (including after the covered region update).  But
> without a memory barrier, there's nothing other than source order to ensure
> that ordering.  So add a memory barrier.
> 
> I'm not sure whether this out-of-order update of the space end could lead to  
> problems in a product build (where the assert doesn't apply).  Without
> looking carefully, there appear to be opportunities for problems, such as
> accessing uncovered parts of the card table.
>  
> There's another issue that I'm not addressing with this change.  Various
> values are being read while subject to concurrent writes, without being in
> any way tagged as atomic. (The writes are under the ExpandHeap_lock, the
> reads are not.) This includes at least the covering region bounds and space
> end.
> 
> Testing:
> mach5 tier1
> I was unable to reproduce the failure, so can't show any before / after
> improvement.

Looks good, just a comment about a comment that you can address if you agree.

src/hotspot/share/gc/parallel/psOldGen.cpp line 385:

> 383: 
> 384:   // ALWAYS do this last!!
> 385:   OrderAccess::storestore();

Maybe update the comment to use less caps and '!'. Instead tie back to the function comment explaining that the barrier is needed to guarantee the order in which the data structures get visible to other threads.

-------------

Marked as reviewed by sjohanss (Reviewer).

PR: https://git.openjdk.java.net/jdk16/pull/35



More information about the hotspot-gc-dev mailing list