RFR (S): 8147087: Race when reusing PerRegionTable bitmaps may result in dropped remembered set entries
Tom Benson
tom.benson at oracle.com
Tue Jan 19 16:35:30 UTC 2016
Great find! Fix looks good to me, except for a missing copyright update.
Tom
On 1/19/2016 11:09 AM, Thomas Schatzl wrote:
> Hi all,
>
> can I have reviews for this tiny change the fixes a race in the G1
> remembered sets, resulting in a potentially missed remembered set
> entry, causing the usual problems.
>
> When G1 reuses a PRT (fine remembered set bitmap) for another region's
> from remembered set, it clears th
> .at bitmap. Due to how this mechanism has been implemented, it has been
> possible that another thread could have accessed and written its
> remembered set entry to that PRT before the original thread cleared it.
>
> I.e. in
>
> void PerRegionTable::init(HeapRegion* hr, bool clear_links_to_all_list)
> {
> if (clear_links_to_all_list) {
> set_next(NULL);
> set_prev(NULL);
> }
> _hr = hr;
> _collision_list_next = NULL; // (1) Thread A is here
>
> // (2) Thread B could get hold of this PRT now and set a bit in _bm,
> // assuming that it is initialized already.
>
> _occupied = 0;
> _bm.clear(); // (3) Thread A clears the bitmap, loosing the rset
> // entry thread B just added.
> // (4) Thread B complains about a missing RSet entry in the PRT or
> // crashes.
> }
>
> program execution leading to the problem is as indicated by the numbers
> in the parentheses.
>
> [This could only happen, if the heap region the PRT had been assigned
> to for thread A and thread B used the same index/bucket list in the
> hash table where PRTs are stored btw.]
>
> The solution is to delay publishing the value of PRT::_hr after the
> bitmap clearing, i.e.
>
> void PerRegionTable::init(HeapRegion* hr, bool clear_links_to_all_list)
> {
> if (clear_links_to_all_list) {
> set_next(NULL);
> set_prev(NULL);
> }
> - _hr = hr;
> _collision_list_next =
> NULL;
> _occupied = 0;
> _bm.clear();
> + // Make sure that the
> bitmap clearing above has been finished before publishing
> + // this
> PRT to concurrent threads.
> + OrderAccess::release_store_ptr(&_hr,
> hr);
> }
>
> Which is the whole relevant change. :)
>
>
> This is a day one G1 bug.
>
> The idea is to backport this change due to its impact. It would be nice
> to get quick reviews. Maybe we can still get it into the next 8u
> release.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8147087
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8147087/webrev/
> Testing:
> Stress test program provided by JDK-8134963 many times without this
> crash, jprt
>
> I added Poonam as author for this change due to her great analysis.
>
> Thanks,
> Thomas
>
More information about the hotspot-gc-dev
mailing list