CRR (M): 7132029: G1: mixed GC phase lasts for longer than it should

Tony Printezis tony.printezis at oracle.com
Wed Feb 15 13:47:56 UTC 2012


Bengt,

Thanks for looking at this at such short notice! Inline.

On 02/15/2012 03:46 AM, Bengt Rutisson wrote:
>
> Hi Tony,
>
> I really like this cleanup! Thanks for fixing it.
>
> Some comments:
>
> collectionSetChooser.hpp:
>
> _numMarkedRegions is kind of a confusing name to me. It does not 
> contain all the marked regions. 

Well, I left it as it was. And it's also not called 
_numAllMarkedRegions. :-) How about I just call it _length (and 
_curMarkedIndex -> _curr_index)?

> It contains all the old regions that we would like to include in the 
> collection set. How about _num_selected_regions or even 
> _num_selected_old_regions?
>
> I think the remove(HeapRegion* hr) method is kind of strange. I 
> understand the need for peek() from finalize_cset(), but I think it is 
> strange that the remove() dependes on the code first calling peek(). 
> I'd prefer if we could make this more iterator style or at least 
> change remove() to someting like pop() that will return the next 
> HeapRegion instead of taking it as a parameter.

Well, peek() and remove() are only called from the loop in 
finalize_cset(), so they are tailored for that. I originally had 
remove() with no parameters that just removed the current region. I then 
thought of passing the region as a parameter mainly to do sanity testing 
(i.e., the region that it's being removed is the one we think it should 
be removed). And I then realized that, given we already have the region, 
there's no point in actually re-reading it from the array. Si we . I 
could generalize remove() to accept hr optionally, but given that we 
only use it when we know what the region to be removed is, I don't see 
the point. Also, I'm not sure that using HeapRegion* pop() is 
appropriate either, given that the caller doesn't need to be given the 
region that just got removed (it already knows which one it is).

Also note that the peek() / remove() pattern is similar to the 
ListIterator class in Java which has a E next() (equivalent to peek()) 
and void remove(). I just added extra information to the remove() 
method. (What I'm trying to say is: there's precedence to such a pattern.)

>
> collectionSetChooser.cpp:
>
> Why is the method CollectionSetChooser::updateAfterFullCollection() 
> needed? Can't G1CollectorPolicy::record_full_collection_end() do 
> _collectionSetChooser->clearMarkedHeapRegions() just like 
> G1CollectorPolicy::record_concurrent_mark_cleanup_end() does?

I'm OK either way. Maybe we used to do more in 
updateAfterFullCollection() once upon the time but we don't do any more. 
I'll remove it and replace it with a direct call to 
clearMarkedHeapRegions() as you suggested.

> g1_globals.hpp:
>
> It seems to me that at least the G1MaxMixedGCNum and the 
> G1OldCSetRegionThresholdPercent can be things that end users might 
> need to tweak until we have the heuristics more thoroughly tested out. 
> Would it be ok to make them available in product builds? Maybe as 
> experiemental flags?

Actually, it might be helpful to expose all four develop flags I added 
to the user so that they can control different aspects of the heuristics:

a) G1OldCSetRegionLiveThresholdPercent and 
G1OldReclaimableThresholdPercent: control how much free space to 
"sacrifice" in order to decrease mixed GC times (BTW, it'd be nice if we 
could somehow combine these two.)

b) G1MaxMixedGCNum: control how promptly G1 will collect the candidate 
old regions: more promptly -> more free space is made available quicker, 
less promptly -> each mixed GC might be shorter.

c) G1OldCSetRegionThresholdPercent: control how aggressive G1 will be in 
collecting old regions during each mixed GC: more aggressive -> more 
free space is made available quicker, less promptly -> each mixed GC 
might be shorter.

(come to think about it, maybe we should think of combining the last two 
too!)

But before we expose these we need to a) get a better understanding of 
what effect each has on G1's performance and b) find better names for 
them. :-) (I didn't try very hard, as I assumed that they will be 
develop parameters.)

> g1CollectorPolicy.cpp:
>
>       if (next_gc_should_be_mixed("end of young GC")) {
>         ergo_verbose0(ErgoMixedGCs,
>                       "start mixed GCs",
>                       ergo_format_reason("appropriate old regions 
> available"));
>         set_gcs_are_young(false);
>       } else {
>         ergo_verbose0(ErgoMixedGCs,
>                       "do not start mixed GCs",
>                       ergo_format_reason("appropriate old regions 
> available"));
>       }
>
> I think the ergo_verbose message in the else case at lines 1490-1492 
> should say "appropriate old regions not available", right?

Correct, thanks. John also caught that.

> We log the ergo decisions in next_gc_should_be_mixed(). Will it not be 
> duplicate information to log it in 
> G1CollectorPolicy::record_collection_pause_end() as well?

That's a good point. This is what the output looks like right now:

  3.708: [G1Ergonomics (Mixed GCs) next GC should be mixed, reason: 
appropriate old regions available, phase: end of young GC, remaining old 
regions: 16 regions, reclaimable: 9803640 bytes (29.22 %), threshold: 
1.00 %]
  3.708: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: appropriate 
old regions available]
  23M->18M(32M), 0.0411940 secs]

...

  3.938: [G1Ergonomics (Mixed GCs) next GC should be mixed, reason: 
appropriate old regions available, phase: end of mixed GC, remaining old 
regions: 12 regions, reclaimable: 5823976 bytes (17.36 %), threshold: 
1.00 %]
  3.938: [G1Ergonomics (Mixed GCs) do not end mixed GCs, reason: 
appropriate old regions available]
  23M->15M(32M), 0.0322850 secs]

I'll see how easy will be to parametarize next_gc_should_be_mixed() with 
the "do" / "do not" action strings (instead of the phase_str).

I'll send out a new webrev in a bit.

Tony

> Bengt
>
>
> On 2012-02-14 16:18, Tony Printezis wrote:
>> Hi all,
>>
>> I'd like a couple of (quick please!) code reviews for this change:
>>
>> http://cr.openjdk.java.net/~tonyp/7132029/webrev.0/
>>
>> The policy that was choosing old regions for collection during mixed 
>> GCs was buggy and it could exhibit many strange behaviors, one of 
>> which was to go into mixed GC mode, do "mixed GCs" (which they did 
>> not actually collect any old regions), and not get out of it for a 
>> while while preventing subsequent marking cycles not to start. This 
>> frequently caused evac failures and Full GCs.
>>
>> The logic on which old regions to add to the CSet was spread to many 
>> places. I simplified that and put it mainly in the loop in the 
>> finalize_cset() method (renamed from choose_collection_set(), 
>> "finalize" is more descriptive on what the method does). I think the 
>> new version is easier to follow.
>>
>> Description of the changes:
>>
>> * I have introduced min and max number of old regions to be added to 
>> the CSet of each mixed GC. Max number is calculated as a percentage 
>> over the heap size (default: 10% - thanks to Jesper for suggesting to 
>> use a heap percentage for this) and ensures that collections will not 
>> get super long. Min number is calculated based on a desired mixed GC 
>> num after a marking cycle (default: 4) and ensures that each mixed GC 
>> will make some progress in collecting old regions (so that the 
>> candidate old regions are collected in a timely fashion).
>>
>> * We now don't add any regions with live percentage over a threshold 
>> (default: 95%) to the CSet chooser and we do not consider them for 
>> collection.
>>
>> * I stopped using the cache in the CSet chooser class (it was used to 
>> resort regions according to their latest GC efficiency), since it's 
>> not necessary any more: we'll now go through all the candidate old 
>> regions in the CSet chooser class and we don't have a heuristic of 
>> when to stop mixed GCs based on GC efficiency.
>>
>> * I introduced the notion of "reclaimable bytes" on HeapRegions, 
>> which not only includes the predicted garbage bytes on each region, 
>> but also the unused space in the area [top,end) which will also be 
>> reclaimed. The CSet chooser class now keeps track of the total 
>> reclaimable bytes of all the regions that it tracks. If that falls 
>> under a certain threshold (default: 1% of the heap) we stop doing 
>> mixed GCs as we'll reclaim very little out of the remaining candidate 
>> old regions.
>>
>> * I eliminated the case where a mixed GC starts and picks no old 
>> regions to collect (I hope!). Now the information on the CSet chooser 
>> (remaining region num / remaining reclaimable bytes) can tell us 
>> whether we want to do more mixed GCs or not. If we do, it's 
>> guaranteed that there will be old regions to collect. Because of 
>> that, I removed the _should_revert_to_young_gcs flag as it's not 
>> needed any more. It was used so that the CSet choice code could flag 
>> that we should stop doing mixed GCs at the end of the GC. Now, we can 
>> decide that by just looking at the information on the CSet chooser 
>> (the heuristic is encapsulated in the next_gc_should_be_mixed()).
>>
>> * I also changed the policy in the case where a fixed young gen is 
>> used. Before, we were a bit arbitrary: during mixed GCs we'd cut the 
>> young gen size in half and fill up the rest with old regions. I 
>> thought that trying to re-use the "desired mixed GC number" heuristic 
>> for this made sense. So, I now add the min number of old regions to 
>> each mixed GC (as long as we don't go over the max). I think this is 
>> a more reasonable and less arbitrary heuristic to what we had before 
>> and it's more consistent with the non-fixed young gen policy. I 
>> didn't know whether I should decrease the young gen size for each 
>> mixed GC, like how we did before, and by how much. So, I now leave it 
>> unchanged (the user decided they want a fixed young gen, they will 
>> now always get it!).
>>
>> * Updated all related ergo output to print the new information.
>>
>> Like the marking changes, this change leaves a fair amount of unused 
>> code that we can remove. I had to draw the line somewhere on how much 
>> I should remove now and how much I should leave for later. I opened a 
>> new CR for the remaining cleanup:
>>
>> 7145441: G1: collection set chooser-related cleanup
>>
>> Many thanks to Charlie for doing some last-minute performance testing 
>> with my workspace. He was the one who discovered the problem with the 
>> never-ending mixed GCs and this fix eliminated the problem. 
>> Correctness-testing-wise, I've run overnight with the GC test suite.
>>
>> Thanks,
>>
>> Tony
>>
>>
>



More information about the hotspot-gc-dev mailing list