RFR: JDK-8058209: Race in G1 card scanning could allow scanning of memory covered by PLABs

Mikael Gerdin mikael.gerdin at oracle.com
Tue Nov 11 15:18:59 UTC 2014


Hi all,

I've sent this to hotspot-dev instead of just hotspot-gc-dev in the hope 
of getting some extra feedback from our resident concurrency experts.

Please review this subtle change to the order in which we read fields in 
G1OffsetTableContigSpace::saved_mark_word, original included here for 
reference:
1003
1004 HeapWord* G1OffsetTableContigSpace::saved_mark_word() const {
1005   G1CollectedHeap* g1h = G1CollectedHeap::heap();
1006   assert( _gc_time_stamp <= g1h->get_gc_time_stamp(), "invariant" );
1007   if (_gc_time_stamp < g1h->get_gc_time_stamp())
1008     return top();
1009   else
1010     return Space::saved_mark_word();
1011 }
1012

When getting a new gc alloc region several stores are performed where 
store ordering needs to be enforced and several synchronization points 
occur.
[write path]
ST(_saved_mark_word)
#StoreStore
ST(_gc_time_stamp)
ST(_top) // satisfying alloc request
#StoreStore
ST(_alloc_region) // publishing to other gc workers
#MonitorEnter
ST(_top) // potential further allocations
#MonitorExit
#MonitorEnter
ST(_top) // potential further allocations
#MonitorExit

When we inspect a region during remembered set scanning we need to 
ensure that we never read memory which have been allocated by a GC 
worker thread for the purpose of copying objects into.
The way this works is that a time stamp field is supposed to signal to a 
scanning thread that it should look at addresses below _top if the time 
stamp is old or addresses below _saved_mark_word if the time stamp is 
current.

The current code does (as seen above)
[read path]
LD(_gc_time_stamp)
LD(_top)
or (depending on time stamp)
LD(_saved_mark_word)

Because these values are written to without full mutual exclusion we 
need to be very careful about the order in which we read these values, 
and this is where I argue that the current code is incorrect.
In order to observe a consistent view of the ordered stores in the 
[write path] above we need to load the values in the reverse order they 
were written, with proper #LoadLoad ordering enforced.

The problem which we've observed here is that after we've read the time 
stamp as below the heap time stamp the top pointer can be updated by a 
GC worker allocating objects into this region. To make sure that the top 
value we see is in fact valid we must read it before we read the time 
stamp which determines which value we should return from the 
saved_mark_word function.

My suggested fix is to load _top first and enforce #LoadLoad ordering 
enforced:
HeapWord* G1OffsetTableContigSpace::saved_mark_word() const {
   G1CollectedHeap* g1h = G1CollectedHeap::heap();
   assert( _gc_time_stamp <= g1h->get_gc_time_stamp(), "invariant" );
   HeapWord* local_top = top();
   OrderAccess::loadload();
   if (_gc_time_stamp < g1h->get_gc_time_stamp()) {
     return local_top;
   } else {
      return Space::saved_mark_word();
   }
}

I've successfully reproduced the crash with the original code by adding 
some random sleep calls between the load of the time stamp and the load 
of top so I'm fairly certain that this resolves the issue. I've also 
verified that the fix I'm proposing does resolve the bug for the team 
which encountered the issue, even if I can't reproduce that crash locally.

I also plan to attempt design around some of the races in this code to 
reduce its complexity, but for the sake of backporting the fix to 8u40 
I'd like to start with just adding the minimal fix.


Bug: https://bugs.openjdk.java.net/browse/JDK-8058209
Webrev: http://cr.openjdk.java.net/~mgerdin/8058209/webrev.0/
Testing: JPRT, local kitchensink (4 hours), gc test suite

Thanks
/Mikael


More information about the hotspot-dev mailing list