RFR: 8272170: Missing memory barrier when checking active state for regions

Mon Nov 15 14:26:35 UTC 2021

On Wed, 10 Nov 2021 08:37:24 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> Hi all,
> 
>   can I have reviews for this fix to memory visibility race where we miss to make sure that the `HeapRegion` pointer in `HeapRegionManager::_regions` is visible before the "active"-check for that `HeapRegion` can return true?
> 
> I.e. the problem is this (according to my understanding):
> 
> void HeapRegionManager::expand(uint start, uint num_regions, WorkerThreads* pretouch_workers) {
>   commit_regions(start, num_regions, pretouch_workers);
>   for (uint i = start; i < start + num_regions; i++) {
>     HeapRegion* hr = _regions.get_by_index(i);
>     if (hr == NULL) {
>       hr = new_heap_region(i);
>       OrderAccess::storestore();
>       _regions.set_by_index(i, hr);
>       _allocated_heapregions_length = MAX2(_allocated_heapregions_length, i + 1);
>     }
>     G1CollectedHeap::heap()->hr_printer()->commit(hr);
>   }
>   activate_regions(start, num_regions);
> }
> 
> E.g. we first commit the memory, then create a `HeapRegion` instance which we properly guard with a `StoreStore` before assigning it to the `_regions` array. After that at the end we activate the regions (via `activate_regions`), meaning that the contents of the new `HeapRegion*` in the region table are valid.
> These bits in the `_active_regions` bitmap are put with memory barriers (via `par_set_range`).
> 
> In `HeapRegionManager::par_iterate`, if we iterate over the region map, we first check whether the region is available in the `_active_regions` bitmap, and then get and pass on the corresponding `HeapRegion*` to the given closure.
> 
> However there is no memory ordering between reading the available-bit in the `_active_regions` bitmap, so that bit could become visible to the thread iterating over the regions before the contents of the `_region` map itself, in effect passing a `nullptr` to the closure which isn't allowed. (Some details about the debugging session in the CR).
> 
> 
> void HeapRegionManager::par_iterate(HeapRegionClosure* blk, HeapRegionClaimer* hrclaimer, const uint start_index) const {
>   // Every worker will actually look at all regions, skipping over regions that
>   // are currently not committed.
>   // This also (potentially) iterates over regions newly allocated during GC. This
>   // is no problem except for some extra work.
>   const uint n_regions = hrclaimer->n_regions();
>   for (uint count = 0; count < n_regions; count++) {
>     const uint index = (start_index + count) % n_regions;
>     assert(index < n_regions, "sanity");
>     // Skip over unavailable regions
>     if (!is_available(index)) {
>       continue;
>     }
>     HeapRegion* r = _regions.get_by_index(index);
> [...]
>     bool res = blk->do_heap_region(r);
>     if (res) {
>       return;
>     }
>   }
> }
> 
> 
> The suggested fix is to ensure proper memory ordering in `is_available`, i.e. the loads must be ordered correctly.
> 
> I did check a bit around the usage of this method and `is_unavailable`, but I do not think there is a similar issue.
> 
> Testing: test crashed with a frequency of around 1/500, now passing 5k runs
> 
> Thanks,
>   Thomas

Marked as reviewed by ayang (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/6324