[jdk16] RFR: 8259271: gc/parallel/TestDynShrinkHeap.java still fails "assert(covered_region.contains(new_memregion)) failed: new region is not in covered_region"
Erik Österlund
eosterlund at openjdk.java.net
Thu Jan 21 14:00:14 UTC 2021
On Thu, 21 Jan 2021 09:26:19 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
> Please review this fix for an intermittent crash when using ParallelGC on
> aarch64. The problem is a mis-ordered pair of reads that permit an
> algorithmic invariant to be violated. The mis-ordering is due to the lack
> any explicit ordering request (a missing barrier).
>
> In MutableSpace::cas_allocate, we had
>
> HeapWord* obj = top();
> if (pointer_delta(end(), obj) >= size) {
> ... space available, attempt the CAS to claim it ...
> }
>
> If end is read before top, other threads may advance top and end between
> those reads. If, when top is read, current top > old end and current top +
> size > current end, the range check will unexpectedly pass because of
> underflow in pointer_delta. This will allow top to be advanced to a value
> which is > current end, violating the algorithmic invariant, and likely
> leading to crashes or memory corruption.
>
> gcc for x86 doesn't reorder the reads, but for aarch64 it does, and is
> permitted to do so. Even if it didn't, there is nothing to prevent the
> hardware from doing so. The solution is to use a load_acquire for top, to
> ensure the needed order.
>
> This may have been the actual root cause of JDK-8257999. However, the
> change made there was and still is needed for the reasons described.
>
> Testing:
> mach5 tier1-3
>
> Even with knowledge of the failure mode it's very hard to reproduce. I was
> unable to catch the underflow case in over 1K attempts using machines in our
> test farm, though StefanK caught it a few times on a personal machine.
Looks good.
-------------
Marked as reviewed by eosterlund (Reviewer).
PR: https://git.openjdk.java.net/jdk16/pull/127
More information about the hotspot-gc-dev
mailing list