RFR (S): 8147087: Race when reusing PerRegionTable bitmaps may result in dropped remembered set entries

Thomas Schatzl thomas.schatzl at oracle.com
Tue Jan 19 16:09:23 UTC 2016


Hi all,

  can I have reviews for this tiny change the fixes a race in the G1
remembered sets, resulting in a potentially missed remembered set
entry, causing the usual problems.

When G1 reuses a PRT (fine remembered set bitmap) for another region's
from remembered set, it clears th
.at bitmap. Due to how this mechanism has been implemented, it has been
possible that another thread could have accessed and written its
remembered set entry to that PRT before the original thread cleared it.

I.e. in

void PerRegionTable::init(HeapRegion* hr, bool clear_links_to_all_list)
{ 
  if (clear_links_to_all_list) { 
    set_next(NULL); 
    set_prev(NULL); 
  } 
  _hr = hr; 
  _collision_list_next = NULL;  // (1) Thread A is here 

  // (2) Thread B could get hold of this PRT now and set a bit in _bm,
  // assuming that it is initialized already.

  _occupied = 0; 
  _bm.clear(); // (3) Thread A clears the bitmap, loosing the rset
               // entry thread B just added. 
  // (4) Thread B complains about a missing RSet entry in the PRT or 
  // crashes.
} 

program execution leading to the problem is as indicated by the numbers
in the parentheses.

[This could only happen, if the heap region the PRT had been assigned
to for thread A and thread B used the same index/bucket list in the
hash table where PRTs are stored btw.]

The solution is to delay publishing the value of PRT::_hr after the
bitmap clearing, i.e.

void PerRegionTable::init(HeapRegion* hr, bool clear_links_to_all_list)
   { 
     if (clear_links_to_all_list) { 
       set_next(NULL);
       set_prev(NULL);
     }
-    _hr = hr;
     _collision_list_next =
NULL;
     _occupied = 0;
     _bm.clear();
+    // Make sure that the
bitmap clearing above has been finished before publishing
+    // this
PRT to concurrent threads.
+    OrderAccess::release_store_ptr(&_hr,
hr);
   }

Which is the whole relevant change. :)


This is a day one G1 bug.

The idea is to backport this change due to its impact. It would be nice
to get quick reviews. Maybe we can still get it into the next 8u
release.

CR:
https://bugs.openjdk.java.net/browse/JDK-8147087
Webrev:
http://cr.openjdk.java.net/~tschatzl/8147087/webrev/
Testing:
Stress test program provided by JDK-8134963 many times without this
crash, jprt

I added Poonam as author for this change due to her great analysis.

Thanks,
  Thomas




More information about the hotspot-gc-dev mailing list