RFR: JDK-8058209: Race in G1 card scanning could allow scanning of memory covered by PLABs
Mikael Gerdin
mikael.gerdin at oracle.com
Tue Nov 11 15:18:59 UTC 2014
Hi all,
I've sent this to hotspot-dev instead of just hotspot-gc-dev in the hope
of getting some extra feedback from our resident concurrency experts.
Please review this subtle change to the order in which we read fields in
G1OffsetTableContigSpace::saved_mark_word, original included here for
reference:
1003
1004 HeapWord* G1OffsetTableContigSpace::saved_mark_word() const {
1005 G1CollectedHeap* g1h = G1CollectedHeap::heap();
1006 assert( _gc_time_stamp <= g1h->get_gc_time_stamp(), "invariant" );
1007 if (_gc_time_stamp < g1h->get_gc_time_stamp())
1008 return top();
1009 else
1010 return Space::saved_mark_word();
1011 }
1012
When getting a new gc alloc region several stores are performed where
store ordering needs to be enforced and several synchronization points
occur.
[write path]
ST(_saved_mark_word)
#StoreStore
ST(_gc_time_stamp)
ST(_top) // satisfying alloc request
#StoreStore
ST(_alloc_region) // publishing to other gc workers
#MonitorEnter
ST(_top) // potential further allocations
#MonitorExit
#MonitorEnter
ST(_top) // potential further allocations
#MonitorExit
When we inspect a region during remembered set scanning we need to
ensure that we never read memory which have been allocated by a GC
worker thread for the purpose of copying objects into.
The way this works is that a time stamp field is supposed to signal to a
scanning thread that it should look at addresses below _top if the time
stamp is old or addresses below _saved_mark_word if the time stamp is
current.
The current code does (as seen above)
[read path]
LD(_gc_time_stamp)
LD(_top)
or (depending on time stamp)
LD(_saved_mark_word)
Because these values are written to without full mutual exclusion we
need to be very careful about the order in which we read these values,
and this is where I argue that the current code is incorrect.
In order to observe a consistent view of the ordered stores in the
[write path] above we need to load the values in the reverse order they
were written, with proper #LoadLoad ordering enforced.
The problem which we've observed here is that after we've read the time
stamp as below the heap time stamp the top pointer can be updated by a
GC worker allocating objects into this region. To make sure that the top
value we see is in fact valid we must read it before we read the time
stamp which determines which value we should return from the
saved_mark_word function.
My suggested fix is to load _top first and enforce #LoadLoad ordering
enforced:
HeapWord* G1OffsetTableContigSpace::saved_mark_word() const {
G1CollectedHeap* g1h = G1CollectedHeap::heap();
assert( _gc_time_stamp <= g1h->get_gc_time_stamp(), "invariant" );
HeapWord* local_top = top();
OrderAccess::loadload();
if (_gc_time_stamp < g1h->get_gc_time_stamp()) {
return local_top;
} else {
return Space::saved_mark_word();
}
}
I've successfully reproduced the crash with the original code by adding
some random sleep calls between the load of the time stamp and the load
of top so I'm fairly certain that this resolves the issue. I've also
verified that the fix I'm proposing does resolve the bug for the team
which encountered the issue, even if I can't reproduce that crash locally.
I also plan to attempt design around some of the races in this code to
reduce its complexity, but for the sake of backporting the fix to 8u40
I'd like to start with just adding the minimal fix.
Bug: https://bugs.openjdk.java.net/browse/JDK-8058209
Webrev: http://cr.openjdk.java.net/~mgerdin/8058209/webrev.0/
Testing: JPRT, local kitchensink (4 hours), gc test suite
Thanks
/Mikael
More information about the hotspot-dev
mailing list