CRR (M): 7132029: G1: mixed GC phase lasts for longer than it should

Tony Printezis tony.printezis at oracle.com
Wed Feb 15 18:36:56 UTC 2012


Bengt,


(We discussed this over the phone, but for the record) move_to_next() is 
not quite descriptive either, give that the region is actually removed. 
I left the remove() method as is but renamed it 
remove_and_move_to_next() as a compromise.

Tony

On 02/15/2012 09:28 AM, Bengt Rutisson wrote:
>
> Tony,
>
> Very short reply inline.
>
> On 2012-02-15 14:47, Tony Printezis wrote:
>> Bengt,
>>
>> Thanks for looking at this at such short notice! Inline.
>>
>> On 02/15/2012 03:46 AM, Bengt Rutisson wrote:
>>>
>>> Hi Tony,
>>>
>>> I really like this cleanup! Thanks for fixing it.
>>>
>>> Some comments:
>>>
>>> collectionSetChooser.hpp:
>>>
>>> _numMarkedRegions is kind of a confusing name to me. It does not 
>>> contain all the marked regions. 
>>
>> Well, I left it as it was. And it's also not called 
>> _numAllMarkedRegions. :-) How about I just call it _length (and 
>> _curMarkedIndex -> _curr_index)?
>
> OK.
>
>>
>>> It contains all the old regions that we would like to include in the 
>>> collection set. How about _num_selected_regions or even 
>>> _num_selected_old_regions?
>>>
>>> I think the remove(HeapRegion* hr) method is kind of strange. I 
>>> understand the need for peek() from finalize_cset(), but I think it 
>>> is strange that the remove() dependes on the code first calling 
>>> peek(). I'd prefer if we could make this more iterator style or at 
>>> least change remove() to someting like pop() that will return the 
>>> next HeapRegion instead of taking it as a parameter.
>>
>> Well, peek() and remove() are only called from the loop in 
>> finalize_cset(), so they are tailored for that. I originally had 
>> remove() with no parameters that just removed the current region. I 
>> then thought of passing the region as a parameter mainly to do sanity 
>> testing (i.e., the region that it's being removed is the one we think 
>> it should be removed). And I then realized that, given we already 
>> have the region, there's no point in actually re-reading it from the 
>> array. Si we . I could generalize remove() to accept hr optionally, 
>> but given that we only use it when we know what the region to be 
>> removed is, I don't see the point. Also, I'm not sure that using 
>> HeapRegion* pop() is appropriate either, given that the caller 
>> doesn't need to be given the region that just got removed (it already 
>> knows which one it is).
>>
>> Also note that the peek() / remove() pattern is similar to the 
>> ListIterator class in Java which has a E next() (equivalent to 
>> peek()) and void remove(). I just added extra information to the 
>> remove() method. (What I'm trying to say is: there's precedence to 
>> such a pattern.)
>
> I'm sorry. I still don't like the remove() method. Can we inline the 
> "_remainingReclaimableBytes -= hr->reclaimable_bytes();" where we now 
> call remove() and then call remove() something like move_to_next() 
> without any HeapRegion at all? Not sure that this was a good 
> suggestion...
>
> Bengt
>
>>
>>>
>>> collectionSetChooser.cpp:
>>>
>>> Why is the method CollectionSetChooser::updateAfterFullCollection() 
>>> needed? Can't G1CollectorPolicy::record_full_collection_end() do 
>>> _collectionSetChooser->clearMarkedHeapRegions() just like 
>>> G1CollectorPolicy::record_concurrent_mark_cleanup_end() does?
>>
>> I'm OK either way. Maybe we used to do more in 
>> updateAfterFullCollection() once upon the time but we don't do any 
>> more. I'll remove it and replace it with a direct call to 
>> clearMarkedHeapRegions() as you suggested.
>>
>>> g1_globals.hpp:
>>>
>>> It seems to me that at least the G1MaxMixedGCNum and the 
>>> G1OldCSetRegionThresholdPercent can be things that end users might 
>>> need to tweak until we have the heuristics more thoroughly tested 
>>> out. Would it be ok to make them available in product builds? Maybe 
>>> as experiemental flags?
>>
>> Actually, it might be helpful to expose all four develop flags I 
>> added to the user so that they can control different aspects of the 
>> heuristics:
>>
>> a) G1OldCSetRegionLiveThresholdPercent and 
>> G1OldReclaimableThresholdPercent: control how much free space to 
>> "sacrifice" in order to decrease mixed GC times (BTW, it'd be nice if 
>> we could somehow combine these two.)
>>
>> b) G1MaxMixedGCNum: control how promptly G1 will collect the 
>> candidate old regions: more promptly -> more free space is made 
>> available quicker, less promptly -> each mixed GC might be shorter.
>>
>> c) G1OldCSetRegionThresholdPercent: control how aggressive G1 will be 
>> in collecting old regions during each mixed GC: more aggressive -> 
>> more free space is made available quicker, less promptly -> each 
>> mixed GC might be shorter.
>>
>> (come to think about it, maybe we should think of combining the last 
>> two too!)
>>
>> But before we expose these we need to a) get a better understanding 
>> of what effect each has on G1's performance and b) find better names 
>> for them. :-) (I didn't try very hard, as I assumed that they will be 
>> develop parameters.)
>>
>>> g1CollectorPolicy.cpp:
>>>
>>>       if (next_gc_should_be_mixed("end of young GC")) {
>>>         ergo_verbose0(ErgoMixedGCs,
>>>                       "start mixed GCs",
>>>                       ergo_format_reason("appropriate old regions 
>>> available"));
>>>         set_gcs_are_young(false);
>>>       } else {
>>>         ergo_verbose0(ErgoMixedGCs,
>>>                       "do not start mixed GCs",
>>>                       ergo_format_reason("appropriate old regions 
>>> available"));
>>>       }
>>>
>>> I think the ergo_verbose message in the else case at lines 1490-1492 
>>> should say "appropriate old regions not available", right?
>>
>> Correct, thanks. John also caught that.
>>
>>> We log the ergo decisions in next_gc_should_be_mixed(). Will it not 
>>> be duplicate information to log it in 
>>> G1CollectorPolicy::record_collection_pause_end() as well?
>>
>> That's a good point. This is what the output looks like right now:
>>
>>  3.708: [G1Ergonomics (Mixed GCs) next GC should be mixed, reason: 
>> appropriate old regions available, phase: end of young GC, remaining 
>> old regions: 16 regions, reclaimable: 9803640 bytes (29.22 %), 
>> threshold: 1.00 %]
>>  3.708: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: 
>> appropriate old regions available]
>>  23M->18M(32M), 0.0411940 secs]
>>
>> ...
>>
>>  3.938: [G1Ergonomics (Mixed GCs) next GC should be mixed, reason: 
>> appropriate old regions available, phase: end of mixed GC, remaining 
>> old regions: 12 regions, reclaimable: 5823976 bytes (17.36 %), 
>> threshold: 1.00 %]
>>  3.938: [G1Ergonomics (Mixed GCs) do not end mixed GCs, reason: 
>> appropriate old regions available]
>>  23M->15M(32M), 0.0322850 secs]
>>
>> I'll see how easy will be to parametarize next_gc_should_be_mixed() 
>> with the "do" / "do not" action strings (instead of the phase_str).
>>
>> I'll send out a new webrev in a bit.
>>
>> Tony
>>
>>> Bengt
>>>
>>>
>>> On 2012-02-14 16:18, Tony Printezis wrote:
>>>> Hi all,
>>>>
>>>> I'd like a couple of (quick please!) code reviews for this change:
>>>>
>>>> http://cr.openjdk.java.net/~tonyp/7132029/webrev.0/
>>>>
>>>> The policy that was choosing old regions for collection during 
>>>> mixed GCs was buggy and it could exhibit many strange behaviors, 
>>>> one of which was to go into mixed GC mode, do "mixed GCs" (which 
>>>> they did not actually collect any old regions), and not get out of 
>>>> it for a while while preventing subsequent marking cycles not to 
>>>> start. This frequently caused evac failures and Full GCs.
>>>>
>>>> The logic on which old regions to add to the CSet was spread to 
>>>> many places. I simplified that and put it mainly in the loop in the 
>>>> finalize_cset() method (renamed from choose_collection_set(), 
>>>> "finalize" is more descriptive on what the method does). I think 
>>>> the new version is easier to follow.
>>>>
>>>> Description of the changes:
>>>>
>>>> * I have introduced min and max number of old regions to be added 
>>>> to the CSet of each mixed GC. Max number is calculated as a 
>>>> percentage over the heap size (default: 10% - thanks to Jesper for 
>>>> suggesting to use a heap percentage for this) and ensures that 
>>>> collections will not get super long. Min number is calculated based 
>>>> on a desired mixed GC num after a marking cycle (default: 4) and 
>>>> ensures that each mixed GC will make some progress in collecting 
>>>> old regions (so that the candidate old regions are collected in a 
>>>> timely fashion).
>>>>
>>>> * We now don't add any regions with live percentage over a 
>>>> threshold (default: 95%) to the CSet chooser and we do not consider 
>>>> them for collection.
>>>>
>>>> * I stopped using the cache in the CSet chooser class (it was used 
>>>> to resort regions according to their latest GC efficiency), since 
>>>> it's not necessary any more: we'll now go through all the candidate 
>>>> old regions in the CSet chooser class and we don't have a heuristic 
>>>> of when to stop mixed GCs based on GC efficiency.
>>>>
>>>> * I introduced the notion of "reclaimable bytes" on HeapRegions, 
>>>> which not only includes the predicted garbage bytes on each region, 
>>>> but also the unused space in the area [top,end) which will also be 
>>>> reclaimed. The CSet chooser class now keeps track of the total 
>>>> reclaimable bytes of all the regions that it tracks. If that falls 
>>>> under a certain threshold (default: 1% of the heap) we stop doing 
>>>> mixed GCs as we'll reclaim very little out of the remaining 
>>>> candidate old regions.
>>>>
>>>> * I eliminated the case where a mixed GC starts and picks no old 
>>>> regions to collect (I hope!). Now the information on the CSet 
>>>> chooser (remaining region num / remaining reclaimable bytes) can 
>>>> tell us whether we want to do more mixed GCs or not. If we do, it's 
>>>> guaranteed that there will be old regions to collect. Because of 
>>>> that, I removed the _should_revert_to_young_gcs flag as it's not 
>>>> needed any more. It was used so that the CSet choice code could 
>>>> flag that we should stop doing mixed GCs at the end of the GC. Now, 
>>>> we can decide that by just looking at the information on the CSet 
>>>> chooser (the heuristic is encapsulated in the 
>>>> next_gc_should_be_mixed()).
>>>>
>>>> * I also changed the policy in the case where a fixed young gen is 
>>>> used. Before, we were a bit arbitrary: during mixed GCs we'd cut 
>>>> the young gen size in half and fill up the rest with old regions. I 
>>>> thought that trying to re-use the "desired mixed GC number" 
>>>> heuristic for this made sense. So, I now add the min number of old 
>>>> regions to each mixed GC (as long as we don't go over the max). I 
>>>> think this is a more reasonable and less arbitrary heuristic to 
>>>> what we had before and it's more consistent with the non-fixed 
>>>> young gen policy. I didn't know whether I should decrease the young 
>>>> gen size for each mixed GC, like how we did before, and by how 
>>>> much. So, I now leave it unchanged (the user decided they want a 
>>>> fixed young gen, they will now always get it!).
>>>>
>>>> * Updated all related ergo output to print the new information.
>>>>
>>>> Like the marking changes, this change leaves a fair amount of 
>>>> unused code that we can remove. I had to draw the line somewhere on 
>>>> how much I should remove now and how much I should leave for later. 
>>>> I opened a new CR for the remaining cleanup:
>>>>
>>>> 7145441: G1: collection set chooser-related cleanup
>>>>
>>>> Many thanks to Charlie for doing some last-minute performance 
>>>> testing with my workspace. He was the one who discovered the 
>>>> problem with the never-ending mixed GCs and this fix eliminated the 
>>>> problem. Correctness-testing-wise, I've run overnight with the GC 
>>>> test suite.
>>>>
>>>> Thanks,
>>>>
>>>> Tony
>>>>
>>>>
>>>
>



More information about the hotspot-gc-dev mailing list