G1 RSet improvement ideas

Tue Feb 4 06:02:28 UTC 2014

Hi Tony,

On 2/4/14 2:41 AM, Tony Printezis wrote:
> Hi Bengt,
>
> Very well spotted! :-) You're completely right that SATB marking will 
> not visit all the edges in the object graph. However, in G1, the extra 
> post-barrier should ensure that all the edges are visited.

Right. Thanks, that was the missing piece in my understanding. :)

Bengt

>
>
> Tony
>
> On 2/3/14, 3:27 PM, Bengt Rutisson wrote:
>>
>> Hi Tony and Thomas,
>>
>> Just a question for my understanding regarding rebuilding RSets. You 
>> both say it is straight forward to do that during marking. How does 
>> that work with SATB since that is not guaranteed to visit all edges 
>> in the object graph?
>>
>> Thanks,
>> Bengt
>>
>>
>>
>> On 2/3/14 6:13 PM, Tony Printezis wrote:
>>> Thomas,
>>>
>>> Thanks for actually reading my brain dump and for the comments. :-) 
>>> See inline.
>>>
>>> On 2/3/14, 4:58 AM, Thomas Schatzl wrote:
>>>> Hi,
>>>>
>>>>    let me have a stab at this long mail with a few half-baked
>>>> thoughts... ;)
>>>>
>>>> On Thu, 2014-01-30 at 12:33 -0500, Tony Printezis wrote:
>>>>> Hi all,
>>>>>
>>>>> I had a recent e-mail exchange with Bengt and Thomas and we touched a
>>>>> bit on G1 RSet work. RSets can be a major bottleneck in G1 right now,
>>>>> but I don't believe it's fundamental issue with the concept. IMHO, 
>>>>> the
>>>>> implementation can be substantially improved to be much more 
>>>>> efficient.
>>>>> I recently created a few issues in our (Twitter) bug system with some
>>>>> ideas on how to do that. I pasted all the text below (but replaced 
>>>>> the
>>>>> internal issue #s with (1), (2), etc.).
>>>>>
>>>>> Let me know if you have any ideas, thoughts, etc. on this.
>>>>>
>>>>> Tony
>>>>>
>>>>> ----------
>>>>>
>>>>> (1) Use a single RSet for all young regions
>>>>>
>>>>> In G1 each region has its own Remembered Set (RSet). During a
>>>>> collection, all RSets of the Collection Set (CSet) regions need to be
>>>>> scanned. Given that all young regions are always collected together
>>>>> during the same collection, it will be beneficial to actually make 
>>>>> all
>>>>> young regions point to the same physical RSet. This will have several
>>>>> advantages:
>>>>>
>>>>> - lower overall RSet footprint (better to have one fuller RSet, than
>>>>> many mostly empty ones; this will also quickly eliminate duplicate 
>>>>> RSet
>>>>> entries across all young regions)
>>>> That is true, but in my experience with a similar system (not on G1),
>>>> the savings are not that big - of course there are, but not huge iirc.
>>>
>>> A quick study to measure this would be interesting. I would be 
>>> really shocked if we don't see quite a lot of duplication, I have to 
>>> say (especially when a very large eden is used).
>>>
>>>> In that direction one could experiment with another idea, a "sticky 
>>>> young
>>>> gen" value in the card table, that can be used to also filter out 
>>>> references
>>>> between old gen regions originating from these cards (they will be
>>>> scanned anyway) too. I think I had that implemented 
>>>> somewhere/sometime, it
>>>> should work.
>>>>
>>>> I.e. during refinement, if we find a to-young reference, instead of
>>>> clearing the card at the end, mark it with some "sticky young 
>>>> value" so
>>>> that the barrier will not redirty that card. Maybe also just 
>>>> leaving it
>>>> dirty may be sufficient.
>>>>
>>>> Note that this idea dirties the card table, so requires clearing.
>>>
>>> Note that G1 does try to avoid "dirty / refine / clean / re-dirty / 
>>> re-refine / re-clean / etc." patterns on cards (whether they point 
>>> to the young gen or not) using the hot card cache (unless you've 
>>> recently removed it! haven't looked at that code for a while... but 
>>> it still seems to be there).
>>>
>>> FWIW: apart from avoiding card duplication, using a single RSet for 
>>> all young regions will also have additional advantages: lower 
>>> footprint (only one table instead of one per region), faster to 
>>> reclaim, fewer malloc / frees, etc.
>>>
>>>>> - more efficient RSet scanning (the number of RSets scanned during a
>>>>> collection will not depend on the number of young regions in the 
>>>>> CSet)
>>>> The main contributing factor for rset scanning is the number of 
>>>> entries,
>>>> not the number of rsets.
>>>
>>> I don't completely agree with this I have to say. Let's assume you 
>>> have some card duplication across all young RSets (for the sake of 
>>> argument). Having each thread having to visit each RSet (imagine you 
>>> have a few 100s), check whether it's been claimed, try to claim it, 
>>> then try to claim cards and contend on the card table is not cheap. 
>>> Just looking at a single RSet, seeing quickly that its entries have 
>>> been claimed by other GC threads, and going on to the next phase 
>>> could be much more efficient.
>>>
>>>>   You can always improve the distribution across
>>>> threads if that is what is your concern.
>>>>
>>>> (G1 typically takes the easy approach by distributing work on a 
>>>> per-region
>>>> basis. That's more an artifact of a quick solution than a real 
>>>> solution.
>>>
>>> Are you talking about distributing RSet scanning work? Or in general?
>>>
>>>> There are already a few CRs and prototypes that improve this 
>>>> situation).
>>>
>>> Can you point me to a couple?
>>>
>>>>> Minus any issues and assumptions that will need to be ironed out 
>>>>> in the
>>>>> RSet code, doing this should be straightforward. During a GC, a 
>>>>> new RSet
>>>> I do not see huge issues with that either.
>>>
>>> (see towards the e-mail for a bit more info on this)
>>>
>>>>> is created and all Survivor regions created during that GC will be 
>>>>> made
>>>>> to point to it. Then, as new Eden regions are allocated, they will 
>>>>> also
>>>>> point to the same RSet. At the start of the next GC, all Eden / 
>>>>> Survivor
>>>>> regions will be pointing to the same RSet. During young GCs, we'll 
>>>>> only
>>>>> need to scan that RSet. During mixed GCs, we'll need to scan the 
>>>>> young
>>>>> RSet and all the RSets of the old regions. But this will be an
>>>>> improvement over what we do now. We'll also need to decouple the 
>>>>> young
>>>>> RSet freeing code from the region reclamation code so that we don't
>>>>> attempt to reclaim the same RSet multiple times.
>>>> Another minus, that might be implementation dependent: possibly 
>>>> lots of
>>>> contention on a single RSet.
>>>
>>> That's definitely a good point. On the other hand, threads trying to 
>>> claim individual RSets should not be too different to threads trying 
>>> to claim sub-components of a single RSet. Also, having one RSet per 
>>> region is an arbitrary way to cut down contention. If it's possible 
>>> to have a single RSet for all the young regions, it will be easy to 
>>> expand that and maybe have N RSets (again, each covering all young 
>>> regions) and easily map individual threads (whether they are writing 
>>> or reading) to one of them. Here, N can be decided on the amount of 
>>> concurrency available. One per region is completely arbitrary in 
>>> that respect.
>>>> Consider machines with a few hundred threads out there.
>>>
>>> (just to clarify) Do you mean:
>>>
>>> - Hundreds of HW threads?
>>> - Hundreds of Java threads?
>>> - Hundreds of GC threads?
>>>
>>>> Not that I think
>>>> anyone ever made measurements regarding that with the current RSets on
>>>> such a system either...
>>>
>>> Not as far as I know...
>>>
>>>>> (2) G1 RSets: Use a single RSet for old regions of a single CSet
>>>>>
>>>>> (This is a follow-up to (1))
>>>>>
>>>>> For the same reasons that it's a good idea to use a single Remembered
>>>>> Set (RSet) for all young regions, it'd be also helpful to use a 
>>>>> single
>>>>> RSet for all old regions that are added to a Collection Set 
>>>>> (CSet). In
>>>>> combination with (1), this will result in only having either one 
>>>>> (young
>>>>> GCs) or two (mixed GCs) RSets to scan during GC.
>>>>>
>>>>> This will be quite a lot more work to do, compared to (1), given 
>>>>> that we
>>>>> only choose the old regions to add to a CSet at the start of a GC. To
>>>>> achieve this we'll have to do a bit more advance preparation: choose
>>>>> groups of old regions to collect together (maybe do that 
>>>>> concurrently to
>>>>> minimize STW work) and combine their RSets. So, at each mixed GC, 
>>>>> we'll
>>>>> pick an old region group, and its associated RSet, and collect them
>>>>> along with the young regions.
>>>> A similar effect could be achieved by increasing region size, it's not
>>>> as flexible though.
>>>
>>> That's kind of a big hammer approach. And it might make other parts 
>>> of G1 less efficient (you might find fewer completely empty regions 
>>> at the end of a marking cycle, which will put more pressure on the 
>>> mixed GCs).
>>>
>>>> It, and all follow-up ideas in that direction, assume that the 
>>>> regions that
>>>> we want to pick and their order can be precalculated. That may be a 
>>>> good
>>>> assumption now, as G1 tries to do the cleanup asap, it may not hold 
>>>> in the
>>>> future.
>>>>
>>>> Eg. there are already minor complaints that the mixed gc phase impacts
>>>> throughput too much during that time, so spreading this phase out 
>>>> may be
>>>> considered in the future (e.g. just reclaim enough to keep up with 
>>>> promotion
>>>> rate and then a little).
>>>
>>> Two things:
>>>
>>> a) The reason I think it's a good idea to do the mixed GCs 
>>> back-to-back is to clean up the heap as quickly as possible to give 
>>> the GC and app breathing room as quickly as possible. Yes, I had 
>>> also thought of spreading them out more. However, my intuition is 
>>> that, given you know there's garbage in heap, might as well try to 
>>> reclaim it asap.
>>>
>>> b) Putting together groups of regions that share an RSet and can be 
>>> collected together is not incompatible with spreading out the mixed 
>>> GCs. You just have to maintain the RSet for that group for longer. 
>>> However, if you do that, you'll end up with more and more stale 
>>> entries in it and you might have to do some extra filtering before 
>>> you decide to do the collection.
>>>
>>>> Also, we might need the flexibility to e.g. clear up particular 
>>>> regions to
>>>> allow allocation of large objects without full GC. That's a major 
>>>> concern
>>>> in some cases, you may have noticed that there are already some CRs 
>>>> trying
>>>> to improve G1 behavior with many large objects.
>>>
>>> (Actually, I haven't looked at all the open GC / G1 CRs)
>>>
>>> Again, this is a very good observation. However, maintaining 
>>> complete RSets for all regions in order to deal with an infrequent 
>>> edge case sounds like the wrong trade-off. Maybe you can do 
>>> something in-between: Always keep very coarse RSets per region, 
>>> i.e., which region, or whatever subdivision you decide on, has 
>>> references into that region (i.e., but not at the card level; 
>>> something that you can do with a fixed data structure like a bitmap 
>>> / byte map). Then, if you unexpectedly want to move some regions to 
>>> make space for a new large one, you can still move a few without 
>>> having to scan the entire heap, but maybe having to scan more than 
>>> individual cards.
>>>
>>>>> (3) G1 RSets: Use one RSet for each GC
>>>>>
>>>>> (follow-up to (1) and (2))
>>>>>
>>>>> If we manage to have one young RSet and one old RSet for each 
>>>>> mixed GC,
>>>>> it'd be straightforward to combine the two to always end up having to
>>>>> scan a single RSet at every GC.
>>>>  From a contention point of view this seems even worse than (1), but a
>>>> straightforward extension.
>>>
>>> I'd agree. But, I still don't think contention on a single RSet has 
>>> to be much worse than having to scan / claim a bunch of RSets. (And 
>>> see note about having multiple RSets to match the available 
>>> concurrency.)
>>>
>>>>> (2) proposes to create old region groups, each with its own combined
>>>>> RSet. We could take this further and grab this combined old RSet 
>>>>> at the
>>>>> start of a GC and use that as the young RSet (for Survivor regions
>>>>> allocated during that GC and Eden regions allocated after that 
>>>>> GC). So,
>>>>> at the start of the next GC, all regions in the CSet (Eden, 
>>>>> Survivors,
>>>>> old region group) will have the same RSet.
>>>> Afair the order in the collection candidate list is only ever 
>>>> evaluated
>>>> once after mark end at the moment. You could create the groups at that
>>>> opportunity already.
>>>
>>> Absolutely. I was thinking that maybe after a marking cycle has 
>>> completed, a concurrent thread (probably, the marking thread itself) 
>>> can do a lot of the mixed GC preparation work (examples: sort the 
>>> old regions, find which ones should be collected, put together the 
>>> region groups, combine / scrub the RSets, etc. whatever needs to be 
>>> done). Then, as each region group has been prepared is made 
>>> available on a queue. Each evacuation pause bases the mixed / young 
>>> GC decision on whether a new group has been made available. This way 
>>> we move the expensive preparation work to the concurrent domain and 
>>> the "when to do a mixed GC" decision becomes very straightforward.
>>>
>>>> Also probably needs retuning of the different thresholds for switching
>>>> between the coarseness levels of the remembered set.
>>>>
>>>>> (4) G1 RSets: Filter old RSets before a collection
>>>>>
>>>>> Old RSets tend to grow quite large and sometimes they contain a large
>>>>> amount of stale entries that need to be scanned for nothing during 
>>>>> a GC.
>>>>> After concurrent marking has completed, but maybe before each 
>>>>> mixed GC,
>>>>> we could introduce a concurrent process that scans RSets of 
>>>>> regions that
>>>>> will be collected and filters out any stale entries (by scanning the
>>>>> cards that the RSet points to and removing any cards that don't 
>>>>> point to
>>>>> the region any more). This should reduce the amount of RSet scanning
>>>>> work that's done during mixed GCs.
>>>>>
>>>>> The easiest way to do this, to avoid any nasty races / MT issues, 
>>>>> is to
>>>>> give each old region a new RSet (during a safepoint, say), store 
>>>>> the old
>>>> This safepoint need not be an extra safepoint, could be piggybacked to
>>>> existing ones.
>>>
>>> I didn't say anything about having an extra safepoint. I agree, this 
>>> can be piggy-backed on an existing safepoint.
>>>
>>>> That actually sounds quite easy to implement, however,
>>>> any suggestions how to find the RSets with the largest amount of stale
>>>> entries?
>>>
>>> Nope. I don't believe is very easy to decide that. I'd just scrub 
>>> all RSets for the old regions to be collected.
>>>
>>>> Apart from the terms "tend to grow quite large" and "sometimes they
>>>> contain a large amount of stale entries", do you have some numbers 
>>>> from
>>>> experience here?
>>>
>>> I had had some quick studies but I didn't keep the numbers after I 
>>> left Oracle. But I had definitely seen RSets for particular regions 
>>> keep growing in size and they really shouldn't.
>>>
>>>>> one somewhere, scan it (concurrently), and add any entries that 
>>>>> need to
>>>>> be retained to the new one. At the end, we can just reclaim the 
>>>>> old one.
>>>> Also, when checking for existing entries, probably only look into the
>>>> new one:
>>>
>>> Of course. The RSet of the region will be the new one. The only one 
>>> will be treated similarly to a bunch of dirty cards (as Igor said in 
>>> a separate e-mail).
>>>
>>>> if it does not exist, add it, and any duplicates from the old
>>>> will have no effect. (Instead of always checking both RSets).
>>>
>>> You should never have to check both RSets.
>>>
>>>>> This way, any new updates to the new RSet will automatically happen
>>>>> through the normal path. Explicitly removing an entry, instead of
>>>>> recreating the RSet as described, will be much harder, given that 
>>>>> we'd
>>>>> have to synchronize with other threads potentially trying to add the
>>>>> same entry.
>>>>>
>>>>> This can be done in addition to (2). In fact, the two operations 
>>>>> could
>>>>> also be combined (i.e. combined old region group's RSets and also 
>>>>> filter
>>>>> them at the same time).
>>>>>
>>>>> ----------
>>>>>
>>>>> (5) G1 RSets: Don't use the card table during RSet scanning to reduce
>>>>> duplication
>>>>>
>>>>> Currently, while scanning Remembered Sets (RSets) during a GC we 
>>>>> use the
>>>>> card table to mark whether a particular card has been scanned or not
>>>>> (since the same card might appear on multiple RSets). This is bad 
>>>>> for a
>>>>> couple of reasons (the first one is the important one, the second 
>>>>> one we
>>>>> could address reasonably easily):
>>>>>
>>>>> - First, we have to clear the card table at the end of the GC, which
>>>>> requires a non-trivial amount of work.
>>>> Do you have numbers here? It definitely is a valid concern (with large
>>>> heaps), but in the logs I got so far, this has not been a really 
>>>> big issue
>>>> yet because the other phases are so much longer...
>>>
>>> Last time I checked, G1 kept track of which regions would 
>>> potentially have dirty cards (IIRC: the collection set regions and 
>>> any regions that appeared on the RSet of the collection set regions; 
>>> I could be wrong). And it only clears the CT for those regions. The 
>>> Clean CT phase is on the GC log. It might not be very long, but if 
>>> you can eliminate it, it can only be a good thing. Also, there might 
>>> be more regions whose CT needs to be cleared after mixed GCs (i.e., 
>>> don't base your decision only on data from young GCs). Finally, if 
>>> you can also remove the code that keeps track of which regions might 
>>> contain dirty cards post GC can also be a good thing.
>>>
>>>> A more coarse second-level card table (512k/1M card size?) remembering
>>>> approximately where these marks are should already go a long way 
>>>> towards
>>>> reducing the overhead?
>>>
>>> As I said earlier, only part of the CT is cleared.
>>>
>>>> Do you have some numbers on how far these card marks are spread 
>>>> out? I would
>>>> assume that particularly on large heaps, only a fraction of the 
>>>> card table
>>>> is ever dirtied per collection.
>>>
>>> For young GCs, I agree there probably won't be too many cards that 
>>> get dirty during a GC. I'm sure there will be more during mixed GCs.
>>>
>>>>> - Second, the card marks during RSet scanning are not done atomically
>>>>> (atomics were deemed too expensive). This is safe (rescanning a 
>>>>> card is
>>>>> not a correctness issue and it shouldn't happen very often anyway).
>>>>> However, this prevents some subtle optimizations (when one thread 
>>>>> comes
>>>>> across a reference on a card that points into the CSet and pushes its
>>>>> location on the stack, it has to recheck that it still points into 
>>>>> the
>>>>> CSet when it eventually pops it from the stack, given that another
>>>>> thread might also scan that card and update the reference meanwhile).
>>>> :)
>>>>
>>>>> A better solution will be to somehow use the RSet data structure for
>>>>> each thread to claim RSet chunks. It can even be done in a 
>>>>> destructive
>>>>> way, given that the RSet will be reclaimed at the end of the GC 
>>>>> anyway.
>>>> That would be something nice (that is not slower in the end than just
>>>> clearing the card table).
>>>
>>> Well, after a collection the RSet(s) of the CSet regions is/are 
>>> reclaimed given that the CSet regions are reclaimed. So, if you have 
>>> a couple of extra fields on the RSet to help with the claiming 
>>> algorithm (or each RSet component, etc.) it shouldn't have any extra 
>>> overhead at the end.
>>>
>>>>> This depends on (3), given we can only ensure that each card is 
>>>>> claimed
>>>>> exactly once if it appears on a single RSet.
>>>>>
>>>>> ----------
>>>>>
>>>>> (6) G1 RSets: Only create old region RSets when they are needed
>>>>>
>>>>> Currently, each region has its own RSet from the beginning and that's
>>>>> maintained throughout the region's lifetime. For very long 
>>>>> regions, it
>>>>> turns out that long-lived old regions end up with a lot of stale 
>>>>> entries
>>>>> in their RSets (as ref fields are updated over time to point to
>>>>> different objects) which make them very bloated. An important
>>>>> observation is that the old region RSets are only really needed 
>>>>> during
>>>>> mixed GCs after a concurrent cycle. Sometimes, concurrent cycles 
>>>>> happen
>>>>> very infrequently (every few hours, maybe) so we constantly 
>>>>> maintain the
>>>>> old region RSets and only take advantage of them very infrequently.
>>>>>
>>>>> A better approach would be to eliminate RSets from old regions 
>>>>> (say, set
>>>>> the RSet field to NULL for those regions so that any refinement
>>>>> operation will just do nothing for refs that point to such 
>>>>> regions) and
>>>>> only recreate them when needed.
>>>> Totally agree.
>>>>> One approach, which has been done by other GCs, is to recreate the 
>>>>> RSets
>>>>> during marking (as the marking phase will visit all live objects).
>>>> Adding that to the marking phase is simple.
>>>>> Maybe a better approach will be to finish the marking phase, do some
>>>>> quick analysis on the old regions to decide which to collect and 
>>>>> which
>>>>> not to collect, and do a sweep phase to populate the RSets of the
>>>>> regions that will be collected. Only when that operation is done, 
>>>>> mixed
>>>>> GCs will be allowed.
>>>> Do you think sweeping the entire heap is faster than piggy-backing 
>>>> to the
>>>> marking, even if done concurrently?
>>>
>>> (clarification : not the entire heap, but only old regions) Yeah, 
>>> interesting trade-off here. I can totally see the attraction of 
>>> building the RSets piggy-backed on marking. However, you might end 
>>> up building several RSets for regions that you won't end up 
>>> collecting (because they're very dense with live objects); so that'd 
>>> be wasted work. I kinda like the the approach of doing it as a 
>>> separate step as it allows you to decide exactly which RSets to 
>>> build. Also, maybe the idea I mentioned earlier about always 
>>> maintaining very coarse-grain RSets per region, could help in 
>>> reducing how much of the old gen you'd have to sweep (this can of 
>>> course degrade to scanning the entire old gen after all).
>>>
>>>>> This is a (better but more involved) alternative to (4) as the 
>>>>> recreated
>>>>> RSets will not be very long-lived, so they will hopefully not need
>>>>> further filtering.
>>>>>
>>>>> This can be combined with the approach in (2), i.e., find the old 
>>>>> region
>>>>> groups we'll collect, give all regions in each group a new RSet, and
>>>>> only then do the sweeping phase to create directly RSets per group
>>>>> instead of per region.
>>>>>
>>>>> ----------
>>>>>
>>>>> (7) G1 RSets: Redesign for remembered set data structures
>>>>>
>>>>> This is somewhat orthogonal to the other G1 RSet issues, but there
>>>>> should be a lot of gains to be made by rethinking / redesigning 
>>>>> the RSet
>>>>> data structures. The concept of having multiple RSet granularities is
>>>>> quite interesting. Unfortunately, the implementation is quite
>>>> The idea seems good.
>>>>
>>>>> inefficient. In particular, there are two hash tables (for fine and
>>>>> sparse entries) which is completely wasteful (requires two 
>>>>> look-ups and
>>>>> extra footprint for the second table). Also, several operations 
>>>>> require
>>>>> locks which affects scalability. Finally, the implementation is too
>>>>> dependent on the region abstraction (i,e., it's baked in the code 
>>>>> that
>>>>> each RSet component represents a region-to-region relationship) which
>>>>> makes them very inflexible (RSset components can get very large when
>>>>> very large regions are used).
>>>> I think just assigning the same RSet to multiple regions works 
>>>> already.
>>>
>>> Have you changed this recently?
>>>
>>>> There is some overhead that will be incurred due to filters assuming a
>>>> one-to-one relationship, but that should be minimal, and benign as 
>>>> in the
>>>> end it will be coalesced with the existing entry anyway. (Did not 
>>>> really
>>>> look at the code).
>>>
>>> IIRC, the way the RSet mapped entries to card addresses was to get 
>>> the region address from the HeapRegion data structure.
>>>
>>>>> The goals of a redesign should be the following:
>>>>>
>>>>> - At a high-level, the new structure should just be a card address ->
>>>>> bool map (false -> card not included in the RSet, true -> card 
>>>>> included
>>>>> in the RSet).
>>>> I am not sure what this means: this suggests an interface, not an 
>>>> implementation.
>>>
>>> What I tried to say is that the implementation shouldn't deal with 
>>> the HeapRegion data structure at all:
>>>
>>> void OtherRegionsTable::add_reference(OopOrNarrowOopStar from, int 
>>> tid) {
>>>   ...
>>>   // Note that this may be a continued H region.
>>>   HeapRegion* from_hr = _g1h->heap_region_containing_raw(from);
>>>   RegionIdx_t from_hrs_ind = (RegionIdx_t) from_hr->hrs_index();
>>>   ...
>>> }
>>>
>>> PerRegionTable*
>>> OtherRegionsTable::find_region_table(size_t ind, HeapRegion* hr) 
>>> const {
>>>   assert(0 <= ind && ind < _max_fine_entries, "Preconditions.");
>>>   PerRegionTable* prt = _fine_grain_regions[ind];
>>>   while (prt != NULL && prt->hr() != hr) {
>>>     prt = prt->collision_list_next();
>>>   }
>>>   // Loop postcondition is the method postcondition.
>>>   return prt;
>>> }
>>>
>>> etc.
>>>
>>> The are many problems with this: First, there's complete lack of 
>>> flexibility if we want to use one RSet for many regions (or if we 
>>> want to change the granularity of the RSet's components). Second, 
>>> there could be stale entries on the RSet that point to free regions 
>>> that are being re-allocated concurrently with the refinement code 
>>> that's actually using hr() (we had some very horrible races in this 
>>> code). This is why I said the RSet should just be a card address -> 
>>> bool map and know nothing about heap regions.
>>>
>>> Tony
>>>
>>>>> - A subdivision of the heap, so that each such heap chunk is 
>>>>> represented
>>>>> by a different structure, can still be used. However, chunks 
>>>>> should not
>>>>> be tied to regions. I.e., a region might be covered by several chunks
>>>>> (helpful, when regions are large).
>>>> Okay.
>>>>
>>>>> - The fine / medium / coarse granularity approach will still be 
>>>>> useful
>>>>> to represent each chunk as efficiently as possible. However, there
>>>>> should be a single hash table that will keep track of the entries 
>>>>> of all
>>>>> granularities.
>>>>> - The RSet structure should have a facility for threads to claim 
>>>>> entries
>>>>> to work on them. This is to facilitate (5).
>>>> :)
>>>>
>>>>> - Addition operations should be entirely lock-free.
>>>>> - RSet iterations should be done using iterators.
>>>>> - A facility to efficiently do a union of various RSets will also be
>>>>> helpful (to facilitate (2)).
>>>> Thomas
>>>>
>>>>
>>>
>>
>