RFR: 8272170: Missing memory barrier when checking active state for regions

Mon Nov 15 14:37:39 UTC 2021

On Mon, 15 Nov 2021 14:23:31 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> Hi all,
>> 
>>   can I have reviews for this fix to memory visibility race where we miss to make sure that the `HeapRegion` pointer in `HeapRegionManager::_regions` is visible before the "active"-check for that `HeapRegion` can return true?
>> 
>> I.e. the problem is this (according to my understanding):
>> 
>> void HeapRegionManager::expand(uint start, uint num_regions, WorkerThreads* pretouch_workers) {
>>   commit_regions(start, num_regions, pretouch_workers);
>>   for (uint i = start; i < start + num_regions; i++) {
>>     HeapRegion* hr = _regions.get_by_index(i);
>>     if (hr == NULL) {
>>       hr = new_heap_region(i);
>>       OrderAccess::storestore();
>>       _regions.set_by_index(i, hr);
>>       _allocated_heapregions_length = MAX2(_allocated_heapregions_length, i + 1);
>>     }
>>     G1CollectedHeap::heap()->hr_printer()->commit(hr);
>>   }
>>   activate_regions(start, num_regions);
>> }
>> 
>> E.g. we first commit the memory, then create a `HeapRegion` instance which we properly guard with a `StoreStore` before assigning it to the `_regions` array. After that at the end we activate the regions (via `activate_regions`), meaning that the contents of the new `HeapRegion*` in the region table are valid.
>> These bits in the `_active_regions` bitmap are put with memory barriers (via `par_set_range`).
>> 
>> In `HeapRegionManager::par_iterate`, if we iterate over the region map, we first check whether the region is available in the `_active_regions` bitmap, and then get and pass on the corresponding `HeapRegion*` to the given closure.
>> 
>> However there is no memory ordering between reading the available-bit in the `_active_regions` bitmap, so that bit could become visible to the thread iterating over the regions before the contents of the `_region` map itself, in effect passing a `nullptr` to the closure which isn't allowed. (Some details about the debugging session in the CR).
>> 
>> 
>> void HeapRegionManager::par_iterate(HeapRegionClosure* blk, HeapRegionClaimer* hrclaimer, const uint start_index) const {
>>   // Every worker will actually look at all regions, skipping over regions that
>>   // are currently not committed.
>>   // This also (potentially) iterates over regions newly allocated during GC. This
>>   // is no problem except for some extra work.
>>   const uint n_regions = hrclaimer->n_regions();
>>   for (uint count = 0; count < n_regions; count++) {
>>     const uint index = (start_index + count) % n_regions;
>>     assert(index < n_regions, "sanity");
>>     // Skip over unavailable regions
>>     if (!is_available(index)) {
>>       continue;
>>     }
>>     HeapRegion* r = _regions.get_by_index(index);
>> [...]
>>     bool res = blk->do_heap_region(r);
>>     if (res) {
>>       return;
>>     }
>>   }
>> }
>> 
>> 
>> The suggested fix is to ensure proper memory ordering in `is_available`, i.e. the loads must be ordered correctly.
>> 
>> I did check a bit around the usage of this method and `is_unavailable`, but I do not think there is a similar issue.
>> 
>> Testing: test crashed with a frequency of around 1/500, now passing 5k runs
>> 
>> Thanks,
>>   Thomas
>
> Marked as reviewed by ayang (Reviewer).

Thanks @albertnetymk @kstefanj for your reviews.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6324