RFR: 8272170: Missing memory barrier when checking active state for regions
Stefan Johansson
sjohanss at openjdk.java.net
Wed Nov 10 10:26:39 UTC 2021
On Wed, 10 Nov 2021 08:37:24 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:
> Hi all,
>
> can I have reviews for this fix to memory visibility race where we miss to make sure that the `HeapRegion` pointer in `HeapRegionManager::_regions` is visible before the "active"-check for that `HeapRegion` can return true?
>
> I.e. the problem is this (according to my understanding):
>
> void HeapRegionManager::expand(uint start, uint num_regions, WorkerThreads* pretouch_workers) {
> commit_regions(start, num_regions, pretouch_workers);
> for (uint i = start; i < start + num_regions; i++) {
> HeapRegion* hr = _regions.get_by_index(i);
> if (hr == NULL) {
> hr = new_heap_region(i);
> OrderAccess::storestore();
> _regions.set_by_index(i, hr);
> _allocated_heapregions_length = MAX2(_allocated_heapregions_length, i + 1);
> }
> G1CollectedHeap::heap()->hr_printer()->commit(hr);
> }
> activate_regions(start, num_regions);
> }
>
> E.g. we first commit the memory, then create a `HeapRegion` instance which we properly guard with a `StoreStore` before assigning it to the `_regions` array. After that at the end we activate the regions (via `activate_regions`), meaning that the contents of the new `HeapRegion*` in the region table are valid.
> These bits in the `_active_regions` bitmap are put with memory barriers (via `par_set_range`).
>
> In `HeapRegionManager::par_iterate`, if we iterate over the region map, we first check whether the region is available in the `_active_regions` bitmap, and then get and pass on the corresponding `HeapRegion*` to the given closure.
>
> However there is no memory ordering between reading the available-bit in the `_active_regions` bitmap, so that bit could become visible to the thread iterating over the regions before the contents of the `_region` map itself, in effect passing a `nullptr` to the closure which isn't allowed. (Some details about the debugging session in the CR).
>
>
> void HeapRegionManager::par_iterate(HeapRegionClosure* blk, HeapRegionClaimer* hrclaimer, const uint start_index) const {
> // Every worker will actually look at all regions, skipping over regions that
> // are currently not committed.
> // This also (potentially) iterates over regions newly allocated during GC. This
> // is no problem except for some extra work.
> const uint n_regions = hrclaimer->n_regions();
> for (uint count = 0; count < n_regions; count++) {
> const uint index = (start_index + count) % n_regions;
> assert(index < n_regions, "sanity");
> // Skip over unavailable regions
> if (!is_available(index)) {
> continue;
> }
> HeapRegion* r = _regions.get_by_index(index);
> [...]
> bool res = blk->do_heap_region(r);
> if (res) {
> return;
> }
> }
> }
>
>
> The suggested fix is to ensure proper memory ordering in `is_available`, i.e. an acquire when accessing the bitmaps to make sure that other stores before the bit store we check are visible.
>
> I did check a bit around the usage of this method and `is_unavailable`, but I do not think there is a similar issue.
>
> Testing: test crashed with a frequency of around 1/500, now passing 5k runs
>
> Thanks,
> Thomas
Looks good, nice debugging!
-------------
Marked as reviewed by sjohanss (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/6324
More information about the hotspot-gc-dev
mailing list