RFR (M): 8060697: Improve G1 Heap Growth Heuristics

Wed Dec 2 19:21:09 UTC 2015

Hi Jon,

On 12/1/2015 6:44 PM, Jon Masamitsu wrote:
>
>
> On 12/1/2015 1:37 PM, Tom Benson wrote:
>> Hi Jon,
>> Thanks very much for the review.
>>
>> On 11/30/2015 3:10 PM, Jon Masamitsu wrote:
>>> Tom,
>>>
>>> http://cr.openjdk.java.net/~tbenson/8060697/webrev/src/share/vm/gc/g1/g1CollectorPolicy.cpp.frames.html
>>>
>>> 1053 // Compute the ratio of just this last pause time to the entire 
>>> time range stored
>>> 1054 // in the vectors.
>>> 1055 _last_pause_time_ratio =
>>> 1056 (pause_time_ms * _recent_prev_end_times_for_all_gcs_sec->num()) 
>>> / interval_ms;
>>>
>>> _last_pause_time_ratio is the ratio of the last pause over the
>>> average interval in the truncated sequence?  By the latter
>>> I mean
>>>
>>> (interval_ms / _recent_prev_end_times_for_all_gcs_sec->num())
>>>
>>> If the truncated sequence has N sample and "interval_ms" is
>>> measured from the oldest sample, I'm calling interval_ms / N
>>> the average interval.
>>>
>>> Is my description correct?  Why do you prefer that to the most recent
>>> pause time ratio?
>>
>> Yes, your description is right.  Basically I want to notice the 
>> *first* pause that goes over the threshold, rather than waiting for 
>> the average over the last 10 pauses to go over.   The reason is that 
>> this will start the code looking for ratios that exceed the threshold 
>> (beginning a "sampling window" so to speak), and I want to do that as 
>> soon as possible.
>
> OK.  That sounds like a good policy.
>
>>
>> If by this: "Why do you prefer that to the most recent pause time 
>> ratio?" you mean "Why not just compare the last pause to the last 
>> single interval?", the reason is that comparing it to the entire 
>> range 'smooths over' some transiently-more-frequent GCs that don't 
>> really reflect a change in heap occupancy.  I see this happening in 
>> specjbb sometimes.  By only comparing against the last interval, 
>> needless growth happens more often, resulting in higher run-to-run 
>> variability.   Another approach would be to raise the minimum number 
>> of threshold crossing pauses that are needed before triggering growth 
>> - but I don't want to delay that for cases where the need is real.  
>> Thomas commented to me that that transient behavior is likely to be 
>> due to something we ultimately want to fix, anyway.   But the current 
>> approach of comparing against the whole recorded range seems to help 
>> alleviate the side effect of needless growth.
>
> More sound reasoning.
>
>>
>>
>>>
>>> Should the 1*M below be 1 region size?
>>> 1545     const size_t min_expand_bytes = 1*M;
>>
>> Hmm, good question.  I didn't change that.  It could certainly be 
>> MIN_REGION_SIZE, which == 1*M.  I think using the actual region size 
>> would likely only have an effect when the heap is nearly at minimum 
>> or maximum sizes. In between, the math is likely to result in a size 
>> larger than that anyway.  I could try it.
>
> I was wondering what happens when the region size is 32M.    Do you 
> recall?
>

The region size isn't a factor in the expansion amount computation. The 
expand code that later uses what it returned rounds up to a multiple of 
region size.  With that in mind, it doesn't really matter what 
min_expand_bytes is, as long as it's less than or equal to 
HeapRegion::GrainBytes...  And greater than zero.  I currently have it 
at min_region_size, but think it makes sense to change to GrainBytes.

>>
>>>
>>> As the uncommitted space in the heap drops, the grow rate drops.
>>>
>>>
>>> 1550     size_t expand_bytes_via_pct =
>>> 1551       uncommitted_bytes * G1ExpandByPercentOfAvailable / 100;
>>>
>>> The scale_factor will increase that by up to a factor of 2, the policy
>>> seems to grow slowly to the maximum.  Is there a reason not to get
>>> to the maximum heap size quickly?
>>
>> Yes, I thought about this as well.  This attribute (shrinking the 
>> growth increment as we approach the limit) is there in both the old 
>> and new code, but the new code may scale the value up. What I 
>> considered, but didn't try, was to use a fixed percentage of the 
>> entire heap, once we have reached a certain threshold by doubling the 
>> size.   That value would still be scaled according to the pause 
>> ratios.   I decided not to pursue it for now, since the results 
>> looked acceptable and it could be done as a follow up.
>
> Let that stand.  I know that G1 historically has grown the heap based 
> on how
> much uncommitted is left.  Don't know if it is good or bad.  I've just 
> never gotten
> used to that approach.

Well, I tried it.  IE, I changed the code to use a fixed 10% of the 
reserved size once the size was at least 20% of reserved, rather than 
using 20% of the remaining uncommitted space.  Results were not very 
different. The max heap size for tests that were well into that 
threshold were a little different.  The main change was that for those 
that go to max heap (such as the compiler tests in jvm08), the top size 
is reached much more quickly, with one or two large increments replacing 
many smaller ones.  Which is good.

If I were going to make the change to commit it, I'd want to experiment 
more with the percentage. I'd hold off on it, though. The main goal of 
the current change is to get the variability under control.  Next phase 
of heap sizing work can consider the fixed increment.

>
>>
>> However, it won't be hard to try, and I can do so if there's 
>> agreement that the rest of this approach seems reasonable.
>
> Some of my second guessing on this policy is that it seems rather
> complex.  Someone will eventually ask for documentation on how
> the heap grows.  Hope you're up to it. :-)
>
> I don't have any more comments.  I'm good with it.

OK, thanks.
Tom

>
> Jon
>
>> Thanks,
>> Tom
>>
>>>
>>> Jon
>>>
>>>
>>> On 11/25/2015 06:06 AM, Tom Benson wrote:
>>>> Hi Kim,
>>>> Thanks very much for the review.  I've implemented all your 
>>>> suggestions.
>>>> Down
>>>> About this:
>>>>
>>>>> I suspect you are introducing some implicit conversions that will 
>>>>> cause
>>>>> problems for the SAP folks; see discussion of 8143215:
>>>>
>>>> ... I think there's one, which is:
>>>>
>>>>     expand_bytes = expand_bytes * scale_factor;
>>>>
>>>> scale_factor is explicitly limited to being between 0.2 and 2.0, 
>>>> and expand_bytes is fraction of the heap size, so there's no chance 
>>>> of overflow.  Would you object to the static cast in this case?  
>>>> How about with a comment?
>>>>
>>>> Tom
>>>>
>>>> On 11/24/2015 9:55 PM, Kim Barrett wrote:
>>>>> On Nov 23, 2015, at 10:02 PM, Tom Benson <tom.benson at oracle.com> 
>>>>> wrote:
>>>>>> Hi,
>>>>>> Here is a proposed change to the G1 heap growth code for review. 
>>>>>> I've added a detailed description to the CR, but here is the 
>>>>>> short version: After a GC pause, the average ratio of time spent 
>>>>>> in recent GC pauses vs overall time is computed. If it exceeds 
>>>>>> GCTimeRatio, the heap is expanded by a fixed amount.  With the 
>>>>>> new code, some deficiencies in the ratio tracking are addressed, 
>>>>>> and the expansion size is scaled according to how much the 
>>>>>> desired ratio is, on average, exceeded by.  The target ratio 
>>>>>> itself is also scaled at the lowest heap sizes.
>>>>>>
>>>>>> The case that triggered this was actually JDK-8132077, where the 
>>>>>> JVM'08 Compress benchmark saw a 40% degradation.  It was due to 
>>>>>> the heap being about half the size in some runs, because of the 
>>>>>> way heap growth worked.
>>>>>>
>>>>>> I'm still collecting the final performance data for this version, 
>>>>>> and will attach it to the CR.  Earlier experimental versions 
>>>>>> showed good improvements in consistency of heap sizes.  A couple 
>>>>>> of benchmarks average a percentage point or two lower, while 
>>>>>> others improve by that much or more.  No growth percentage or 
>>>>>> scaling is going to be ideal for every test, but the goal was to 
>>>>>> maintain performance without growing too large. In fact, some 
>>>>>> tests now use much smaller heaps.
>>>>>>
>>>>>> CR:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8060697
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~tbenson/8060697/webrev/
>>>>> Generally looks good; just a few very minor things, most of which 
>>>>> you can
>>>>> take or ignore as you prefer.  I don't need a new webrev for any 
>>>>> of these.
>>>>>
>>>>> The comments were very helpful in understanding what's going on.
>>>>>
>>>>> I suspect you are introducing some implicit conversions that will 
>>>>> cause
>>>>> problems for the SAP folks; see discussion of 8143215: gcc 4.1.2: 
>>>>> fix three
>>>>> issues breaking the build.  But the resolution for that is still 
>>>>> being
>>>>> discussed, and we don't have an easy way to discover these for 
>>>>> ourselves, so
>>>>> I don't think you should worry about it here right now.
>>>>>
>>>>> ------------------------------------------------------------------------------ 
>>>>>
>>>>> src/share/vm/gc/g1/g1CollectorPolicy.cpp
>>>>> 1571       static double const MinScaleDownFactor = 0.2;
>>>>> 1572       static double const MaxScaleUpFactor = 2;
>>>>> 1573       static double const StartScaleDownAt = _gc_overhead_perc;
>>>>> 1574       static double const StartScaleUpAt = _gc_overhead_perc 
>>>>> * 1.5;
>>>>> 1575       static double const ScaleUpRange = _gc_overhead_perc * 
>>>>> 2.0;
>>>>>
>>>>> I suggest these not be static.
>>>>>
>>>>> It doesn't really matter for the first two.
>>>>>
>>>>> But for the others, there is a hidden cost to making them static, 
>>>>> due to
>>>>> some compilers ensuring thread-safe initialization.  C++11 mandates
>>>>> thread-safe initialization of function scoped statics. gcc has 
>>>>> implemented
>>>>> that starting circa gcc4.0 (if I recall correctly), controlled by 
>>>>> a CLA
>>>>> (-f[no]-threadsafe-statics).  Visual Studio 2013 also includes 
>>>>> this feature,
>>>>> as part of their incremental rollout of C++11 (and later) 
>>>>> features.  I don't
>>>>> know about other compilers.
>>>>>
>>>>> The cost of making them static is likely at least comparable to 
>>>>> computing
>>>>> them.  And making them static implies the _gc_overhead_perc is 
>>>>> effectively a
>>>>> constant, which appears to be true today, but who knows what will 
>>>>> happen
>>>>> tomorrow.
>>>>>
>>>>> ------------------------------------------------------------------------------ 
>>>>>
>>>>> src/share/vm/gc/g1/g1CollectorPolicy.cpp
>>>>> 1587         scale_factor = MAX2<double>(scale_factor, 
>>>>> MinScaleDownFactor);
>>>>> 1590         scale_factor = MIN2<double>(scale_factor, 
>>>>> MaxScaleUpFactor);
>>>>>
>>>>> The explicit double template parameter isn't needed here, since the
>>>>> arguments are already both doubles.
>>>>>
>>>>> ------------------------------------------------------------------------------ 
>>>>>
>>>>> src/share/vm/gc/g1/g1CollectorPolicy.cpp
>>>>> 1525     threshold = (threshold * ((double)_g1->capacity() / 
>>>>> (double)(_g1->max_capacity() / 2)));
>>>>>
>>>>> This might be easier to read if it used "threshold *= ...".
>>>>>
>>>>> ------------------------------------------------------------------------------ 
>>>>>
>>>>> src/share/vm/gc/g1/g1CollectorPolicy.cpp
>>>>> 1526     threshold = MAX2<double>(threshold, 1);
>>>>>
>>>>> The explicit double template parameter wouldn't be needed if "1" was
>>>>> replaced with "1.0".
>>>>>
>>>>> ------------------------------------------------------------------------------ 
>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20151202/80cc717a/attachment.htm>