[jdk16] Integrated: 8257999: Parallel GC crash in gc/parallel/TestDynShrinkHeap.java: new region is not in covered_region

Kim Barrett kbarrett at openjdk.java.net
Thu Dec 17 14:21:15 UTC 2020


On Wed, 16 Dec 2020 14:05:13 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this change to ParallelGC oldgen allocation, adding a missing
> memory barrier.
> 
> The problem arises in the interaction between concurrent oldgen allocations,
> where each would, if done serially (in either order), require expansion of
> the generation.
> 
> An allocation of size N compares the mutable space's (end - top) with N to
> determine if space is available.  If available, use top as the start of the
> object of size N (adjusting top atomically) and assert the resulting memory
> region is in the covered area.  If not, then expand.
> 
> Expansion updates the covered region, then updates the space (i.e. end).
> There is currently no memory barrier between those operations.
> 
> As a result, we can have thread1 having done an expansion, updating the
> covered region and the space end. Because there's no memory barrier there,
> the space end may be updated before the covered region as far as some other
> thread is concerned.
> 
> Meanwhile thread2's allocation reads the new end and goes ahead with the
> allocation (which would not have fit with the old end value), then fails the
> covered region check because it used the old covered range.  Although the
> reads of end and the covered range are ordered here by the intervening CAS
> of top, that doesn't help if the writes by thread1 are not also properly
> ordered.
> 
> There is even a comment about this in PSOldGen::post_resize(), saying the
> space update must be last (including after the covered region update).  But
> without a memory barrier, there's nothing other than source order to ensure
> that ordering.  So add a memory barrier.
> 
> I'm not sure whether this out-of-order update of the space end could lead to  
> problems in a product build (where the assert doesn't apply).  Without
> looking carefully, there appear to be opportunities for problems, such as
> accessing uncovered parts of the card table.
>  
> There's another issue that I'm not addressing with this change.  Various
> values are being read while subject to concurrent writes, without being in
> any way tagged as atomic. (The writes are under the ExpandHeap_lock, the
> reads are not.) This includes at least the covering region bounds and space
> end.
> 
> Testing:
> mach5 tier1
> I was unable to reproduce the failure, so can't show any before / after
> improvement.

This pull request has now been integrated.

Changeset: 61390d8e
Author:    Kim Barrett <kbarrett at openjdk.org>
URL:       https://git.openjdk.java.net/jdk16/commit/61390d8e
Stats:     4 lines in 1 file changed: 3 ins; 0 del; 1 mod

8257999: Parallel GC crash in gc/parallel/TestDynShrinkHeap.java: new region is not in covered_region

Reviewed-by: sjohanss, tschatzl

-------------

PR: https://git.openjdk.java.net/jdk16/pull/35



More information about the hotspot-gc-dev mailing list