RFR: JDK-8058209: Race in G1 card scanning could allow scanning of memory covered by PLABs
Bengt Rutisson
bengt.rutisson at oracle.com
Wed Nov 12 09:09:04 UTC 2014
Hi Mikael,
Looks good to me.
Really good work!
Bengt
On 2014-11-11 16:18, Mikael Gerdin wrote:
> Hi all,
>
> I've sent this to hotspot-dev instead of just hotspot-gc-dev in the
> hope of getting some extra feedback from our resident concurrency
> experts.
>
> Please review this subtle change to the order in which we read fields
> in G1OffsetTableContigSpace::saved_mark_word, original included here
> for reference:
> 1003
> 1004 HeapWord* G1OffsetTableContigSpace::saved_mark_word() const {
> 1005 G1CollectedHeap* g1h = G1CollectedHeap::heap();
> 1006 assert( _gc_time_stamp <= g1h->get_gc_time_stamp(), "invariant" );
> 1007 if (_gc_time_stamp < g1h->get_gc_time_stamp())
> 1008 return top();
> 1009 else
> 1010 return Space::saved_mark_word();
> 1011 }
> 1012
>
> When getting a new gc alloc region several stores are performed where
> store ordering needs to be enforced and several synchronization points
> occur.
> [write path]
> ST(_saved_mark_word)
> #StoreStore
> ST(_gc_time_stamp)
> ST(_top) // satisfying alloc request
> #StoreStore
> ST(_alloc_region) // publishing to other gc workers
> #MonitorEnter
> ST(_top) // potential further allocations
> #MonitorExit
> #MonitorEnter
> ST(_top) // potential further allocations
> #MonitorExit
>
> When we inspect a region during remembered set scanning we need to
> ensure that we never read memory which have been allocated by a GC
> worker thread for the purpose of copying objects into.
> The way this works is that a time stamp field is supposed to signal to
> a scanning thread that it should look at addresses below _top if the
> time stamp is old or addresses below _saved_mark_word if the time
> stamp is current.
>
> The current code does (as seen above)
> [read path]
> LD(_gc_time_stamp)
> LD(_top)
> or (depending on time stamp)
> LD(_saved_mark_word)
>
> Because these values are written to without full mutual exclusion we
> need to be very careful about the order in which we read these values,
> and this is where I argue that the current code is incorrect.
> In order to observe a consistent view of the ordered stores in the
> [write path] above we need to load the values in the reverse order
> they were written, with proper #LoadLoad ordering enforced.
>
> The problem which we've observed here is that after we've read the
> time stamp as below the heap time stamp the top pointer can be updated
> by a GC worker allocating objects into this region. To make sure that
> the top value we see is in fact valid we must read it before we read
> the time stamp which determines which value we should return from the
> saved_mark_word function.
>
> My suggested fix is to load _top first and enforce #LoadLoad ordering
> enforced:
> HeapWord* G1OffsetTableContigSpace::saved_mark_word() const {
> G1CollectedHeap* g1h = G1CollectedHeap::heap();
> assert( _gc_time_stamp <= g1h->get_gc_time_stamp(), "invariant" );
> HeapWord* local_top = top();
> OrderAccess::loadload();
> if (_gc_time_stamp < g1h->get_gc_time_stamp()) {
> return local_top;
> } else {
> return Space::saved_mark_word();
> }
> }
>
> I've successfully reproduced the crash with the original code by
> adding some random sleep calls between the load of the time stamp and
> the load of top so I'm fairly certain that this resolves the issue.
> I've also verified that the fix I'm proposing does resolve the bug for
> the team which encountered the issue, even if I can't reproduce that
> crash locally.
>
> I also plan to attempt design around some of the races in this code to
> reduce its complexity, but for the sake of backporting the fix to 8u40
> I'd like to start with just adding the minimal fix.
>
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8058209
> Webrev: http://cr.openjdk.java.net/~mgerdin/8058209/webrev.0/
> Testing: JPRT, local kitchensink (4 hours), gc test suite
>
> Thanks
> /Mikael
More information about the hotspot-dev
mailing list