RFR (S): 8147087: Race when reusing PerRegionTable bitmaps may result in dropped remembered set entries

Tom Benson tom.benson at oracle.com
Tue Jan 19 16:35:30 UTC 2016


Great find!  Fix looks good to me, except for a missing copyright update.
Tom

On 1/19/2016 11:09 AM, Thomas Schatzl wrote:
> Hi all,
>
>    can I have reviews for this tiny change the fixes a race in the G1
> remembered sets, resulting in a potentially missed remembered set
> entry, causing the usual problems.
>
> When G1 reuses a PRT (fine remembered set bitmap) for another region's
> from remembered set, it clears th
> .at bitmap. Due to how this mechanism has been implemented, it has been
> possible that another thread could have accessed and written its
> remembered set entry to that PRT before the original thread cleared it.
>
> I.e. in
>
> void PerRegionTable::init(HeapRegion* hr, bool clear_links_to_all_list)
> {
>    if (clear_links_to_all_list) {
>      set_next(NULL);
>      set_prev(NULL);
>    }
>    _hr = hr;
>    _collision_list_next = NULL;  // (1) Thread A is here
>
>    // (2) Thread B could get hold of this PRT now and set a bit in _bm,
>    // assuming that it is initialized already.
>
>    _occupied = 0;
>    _bm.clear(); // (3) Thread A clears the bitmap, loosing the rset
>                 // entry thread B just added.
>    // (4) Thread B complains about a missing RSet entry in the PRT or
>    // crashes.
> }
>
> program execution leading to the problem is as indicated by the numbers
> in the parentheses.
>
> [This could only happen, if the heap region the PRT had been assigned
> to for thread A and thread B used the same index/bucket list in the
> hash table where PRTs are stored btw.]
>
> The solution is to delay publishing the value of PRT::_hr after the
> bitmap clearing, i.e.
>
> void PerRegionTable::init(HeapRegion* hr, bool clear_links_to_all_list)
>     {
>       if (clear_links_to_all_list) {
>         set_next(NULL);
>         set_prev(NULL);
>       }
> -    _hr = hr;
>       _collision_list_next =
> NULL;
>       _occupied = 0;
>       _bm.clear();
> +    // Make sure that the
> bitmap clearing above has been finished before publishing
> +    // this
> PRT to concurrent threads.
> +    OrderAccess::release_store_ptr(&_hr,
> hr);
>     }
>
> Which is the whole relevant change. :)
>
>
> This is a day one G1 bug.
>
> The idea is to backport this change due to its impact. It would be nice
> to get quick reviews. Maybe we can still get it into the next 8u
> release.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8147087
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8147087/webrev/
> Testing:
> Stress test program provided by JDK-8134963 many times without this
> crash, jprt
>
> I added Poonam as author for this change due to her great analysis.
>
> Thanks,
>    Thomas
>




More information about the hotspot-gc-dev mailing list