[jdk16] Integrated: 8257999: Parallel GC crash in gc/parallel/TestDynShrinkHeap.java: new region is not in covered_region
Kim Barrett
kbarrett at openjdk.java.net
Thu Dec 17 14:21:15 UTC 2020
On Wed, 16 Dec 2020 14:05:13 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
> Please review this change to ParallelGC oldgen allocation, adding a missing
> memory barrier.
>
> The problem arises in the interaction between concurrent oldgen allocations,
> where each would, if done serially (in either order), require expansion of
> the generation.
>
> An allocation of size N compares the mutable space's (end - top) with N to
> determine if space is available. If available, use top as the start of the
> object of size N (adjusting top atomically) and assert the resulting memory
> region is in the covered area. If not, then expand.
>
> Expansion updates the covered region, then updates the space (i.e. end).
> There is currently no memory barrier between those operations.
>
> As a result, we can have thread1 having done an expansion, updating the
> covered region and the space end. Because there's no memory barrier there,
> the space end may be updated before the covered region as far as some other
> thread is concerned.
>
> Meanwhile thread2's allocation reads the new end and goes ahead with the
> allocation (which would not have fit with the old end value), then fails the
> covered region check because it used the old covered range. Although the
> reads of end and the covered range are ordered here by the intervening CAS
> of top, that doesn't help if the writes by thread1 are not also properly
> ordered.
>
> There is even a comment about this in PSOldGen::post_resize(), saying the
> space update must be last (including after the covered region update). But
> without a memory barrier, there's nothing other than source order to ensure
> that ordering. So add a memory barrier.
>
> I'm not sure whether this out-of-order update of the space end could lead to
> problems in a product build (where the assert doesn't apply). Without
> looking carefully, there appear to be opportunities for problems, such as
> accessing uncovered parts of the card table.
>
> There's another issue that I'm not addressing with this change. Various
> values are being read while subject to concurrent writes, without being in
> any way tagged as atomic. (The writes are under the ExpandHeap_lock, the
> reads are not.) This includes at least the covering region bounds and space
> end.
>
> Testing:
> mach5 tier1
> I was unable to reproduce the failure, so can't show any before / after
> improvement.
This pull request has now been integrated.
Changeset: 61390d8e
Author: Kim Barrett <kbarrett at openjdk.org>
URL: https://git.openjdk.java.net/jdk16/commit/61390d8e
Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod
8257999: Parallel GC crash in gc/parallel/TestDynShrinkHeap.java: new region is not in covered_region
Reviewed-by: sjohanss, tschatzl
-------------
PR: https://git.openjdk.java.net/jdk16/pull/35
More information about the hotspot-gc-dev
mailing list