RFR: JDK-8058209: Race in G1 card scanning could allow scanning of memory covered by PLABs
bill pittore
bill.pittore at oracle.com
Fri Nov 14 14:48:37 UTC 2014
I don't want to hold up your current push but have a question inline.
On 11/14/2014 8:20 AM, Mikael Gerdin wrote:
> All,
>
> I've just realized (with the help of Stefan K.) that my webrev.1 is in
> fact incorrect. It actually re-introduces the race by using a local
> variable for the time stamp assert.
>
> With that in mind I want to push the contents of webrev.0 (with the
> indentation fix) instead since that has gone through extensive testing.
>
> I plan to push this today and I'm already working on a bunch of
> cleanups to this code.
>
> /Mikael
>
> On 2014-11-11 16:18, Mikael Gerdin wrote:
>> Hi all,
>>
>> I've sent this to hotspot-dev instead of just hotspot-gc-dev in the hope
>> of getting some extra feedback from our resident concurrency experts.
>>
>> Please review this subtle change to the order in which we read fields in
>> G1OffsetTableContigSpace::saved_mark_word, original included here for
>> reference:
>> 1003
>> 1004 HeapWord* G1OffsetTableContigSpace::saved_mark_word() const {
>> 1005 G1CollectedHeap* g1h = G1CollectedHeap::heap();
>> 1006 assert( _gc_time_stamp <= g1h->get_gc_time_stamp(),
>> "invariant" );
>> 1007 if (_gc_time_stamp < g1h->get_gc_time_stamp())
>> 1008 return top();
>> 1009 else
>> 1010 return Space::saved_mark_word();
>> 1011 }
>> 1012
>>
>> When getting a new gc alloc region several stores are performed where
>> store ordering needs to be enforced and several synchronization points
>> occur.
>> [write path]
>> ST(_saved_mark_word)
>> #StoreStore
>> ST(_gc_time_stamp)
>> ST(_top) // satisfying alloc request
If at this point in time the read thread running on some other core (on
a system with weak memory ordering) reads _top and _gc_time_stamp I'm
not convinced that the values will always be what you think. The store
of _top could float above the store of _gc_time_stamp. I don't know the
gc code in question so I don't know if that's good or bad but I do think
it's possible. If you need strict ordering of the two writes then a
StoreStore between them will guarantee that. Bertrand or David H. please
correct me if I'm wrong.
thanks,
bill
>> #StoreStore
>> ST(_alloc_region) // publishing to other gc workers
>> #MonitorEnter
>> ST(_top) // potential further allocations
>> #MonitorExit
>> #MonitorEnter
>> ST(_top) // potential further allocations
>> #MonitorExit
>>
>> When we inspect a region during remembered set scanning we need to
>> ensure that we never read memory which have been allocated by a GC
>> worker thread for the purpose of copying objects into.
>> The way this works is that a time stamp field is supposed to signal to a
>> scanning thread that it should look at addresses below _top if the time
>> stamp is old or addresses below _saved_mark_word if the time stamp is
>> current.
>>
>> The current code does (as seen above)
>> [read path]
>> LD(_gc_time_stamp)
>> LD(_top)
>> or (depending on time stamp)
>> LD(_saved_mark_word)
>>
>> Because these values are written to without full mutual exclusion we
>> need to be very careful about the order in which we read these values,
>> and this is where I argue that the current code is incorrect.
>> In order to observe a consistent view of the ordered stores in the
>> [write path] above we need to load the values in the reverse order they
>> were written, with proper #LoadLoad ordering enforced.
>>
>> The problem which we've observed here is that after we've read the time
>> stamp as below the heap time stamp the top pointer can be updated by a
>> GC worker allocating objects into this region. To make sure that the top
>> value we see is in fact valid we must read it before we read the time
>> stamp which determines which value we should return from the
>> saved_mark_word function.
>>
>> My suggested fix is to load _top first and enforce #LoadLoad ordering
>> enforced:
>> HeapWord* G1OffsetTableContigSpace::saved_mark_word() const {
>> G1CollectedHeap* g1h = G1CollectedHeap::heap();
>> assert( _gc_time_stamp <= g1h->get_gc_time_stamp(), "invariant" );
>> HeapWord* local_top = top();
>> OrderAccess::loadload();
>> if (_gc_time_stamp < g1h->get_gc_time_stamp()) {
>> return local_top;
>> } else {
>> return Space::saved_mark_word();
>> }
>> }
>>
>> I've successfully reproduced the crash with the original code by adding
>> some random sleep calls between the load of the time stamp and the load
>> of top so I'm fairly certain that this resolves the issue. I've also
>> verified that the fix I'm proposing does resolve the bug for the team
>> which encountered the issue, even if I can't reproduce that crash
>> locally.
>>
>> I also plan to attempt design around some of the races in this code to
>> reduce its complexity, but for the sake of backporting the fix to 8u40
>> I'd like to start with just adding the minimal fix.
>>
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8058209
>> Webrev: http://cr.openjdk.java.net/~mgerdin/8058209/webrev.0/
>> Testing: JPRT, local kitchensink (4 hours), gc test suite
>>
>> Thanks
>> /Mikael
More information about the hotspot-dev
mailing list