RFR(S): 7185699: G1: Prediction model discrepancies

Wed Jul 25 18:39:32 UTC 2012

Hi Thomas,

On 7/25/2012 2:14 AM, Thomas Schatzl wrote:
> here are some notes, although I am not an official reviewer...

You don't need to be an official reviewer to review the code and send 
feedback. A review from an official reviewer is needed in order to push 
the code. Since you don't (yet) have an OpenJDK username I can't list 
you as a reviewer - but you're feedback is welcome.

> G1CollectedHeap::max_pending_card_num() just below of the fixed
> pending_card_num() seems to have the same issue.
>

Thanks. Good catch (but see below).

> I also think G1CollectorPolicy::predict_pending_cards() and
> G1CollectorPolicy::predict_pending_card_diff() can be removed, they do
> not seem to be referenced anywhere in the code.
>

Yes. They don't appear to be referenced and I've removed them. This also 
resulted in some other routines, including max_pending_card_num(), and 
some fields becoming unused. I've removed those too.

>> As a result we were vastly overestimating
>> the base pause time. The second was in the code that calculates the GC
> I think the effects mostly cancelled themselves out long term due to the
> way the prediction is calculated: the time prediction is based on a
> prediction of pending_cards times a cost prediction per card.
> Pending_cards is too high by the oopSize as per your patch, but the cost
> prediction per card has been derived from a value based on the rset
> update time divided by the number of pending cards, which was too high
> by that same factor.
>
> (I haven't measured anything)

OK. Your analysis makes sense. In the instrumentation I added (and 
removed), however, I see much smaller values for base_time_ms but the 
tests I ran don't approach anywhere near "long term".

>> efficiency of a non-young region. The GC efficiency of a region is
>> calculated at the end of marking. The code, however, was using the wrong
>> routine to predict the number of cards for a given RSet length. It was
>> using the data collected for young GCs when it should have been using
>> the data collected for mixed GCs.
> I have one question about a probable use of predictors of different
> types of collection. In particular, during mixed collections, the
> predict_region_elapsed_time_ms() method is also called to determine
> young generation regions evacuation time predictions, with the "young"
> parameter set to true always.
>
> Looking at the code (copied below for convenience), during that time the
> number of cards prediction for young gen regions is based on the
> prediction of mixed collection card lengths (as expected), but at the
> same time, the "other" time prediction is based on the time prediction
> for the young collections.
>
> This seems odd, because I'd expect both prediction components to be
> based on the same prior decision at all times.
>
> Maybe you could improve these predictions for the young gen regions by
> always base them on the young collection predictors. This assumes that
> young gen region timing is the same independent on the type of
> collection - but still the same for both card length and "other" time.
> Wouldn't it be better to base the selection of predictions exclusively
> on the "young" parameter, and eventually pass gcs_are_young() as value
> for the "young" parameter appropriately?
>
> This would also automatically remove the need for the inlined code in
> HeapRegion::calc_gc_efficiency, which seems just a source for errors. I
> doubt anybody reads the "synchronize me" comments in heapRegion.cpp when
> changing the prediction code in g1CollectorPolicy.cpp. Additionally this
> calculation seems to be some policy  that would seem perfect for putting
> the method into G1CollectorPolicy.
> At least then the "synchronize me" comment could be in the same file, or
> even factored out into a single method.

Good observation. I'm not an expert in this code but I believe the 
following code is the reason:

     double young_other_time_ms = 0.0;
     if (young_cset_region_length() > 0) {
       young_other_time_ms =
         phase_times()->_recorded_young_cset_choice_time_ms +
         phase_times()->_recorded_young_free_cset_time_ms;
       _young_other_cost_per_region_ms_seq->add(young_other_time_ms /
                                           (double) 
young_cset_region_length());
     }
     double non_young_other_time_ms = 0.0;
     if (old_cset_region_length() > 0) {
       non_young_other_time_ms =
         phase_times()->_recorded_non_young_cset_choice_time_ms +
         phase_times()->_recorded_non_young_free_cset_time_ms;

_non_young_other_cost_per_region_ms_seq->add(non_young_other_time_ms /
                                             (double) 
old_cset_region_length());
     }

     double constant_other_time_ms = all_other_time_ms -
       (young_other_time_ms + non_young_other_time_ms);
     _constant_other_time_ms_seq->add(constant_other_time_ms);

It looks like we're keeping track of how long it takes to add a region 
to the collection set and how long it takes to free a region in the 
collection set (based upon the region type). Haven't you shown that 
freeing old regions during mixed GCs takes a lot longer than freeing 
young regions? So choosing the other time for a region (vs. the base 
other time for the pause) is a function of the region's type and not the 
GC type. Though I'm not sure why we pass it as a parameter. I'll play 
around with refactoring the code to eliminate inlining the code into 
calc_gc_efficiency().

Thanks,

JohnC