RFR (M): 8136681: Factor out IHOP calculation from G1CollectorPolicy

Mon Nov 16 19:02:15 UTC 2015

On 11/16/2015 05:31 AM, Thomas Schatzl wrote:
> Hi Jon,
>
>    thanks a lot for all these reminders for better documentation. I have
> been working too long on this functionality so that "everything is
> clear" to me :)
>
> New webrevs with hopefully more complete explanations at:
> http://cr.openjdk.java.net/~tschatzl/8136681/webrev.1_to_2/
> (incremental)
> http://cr.openjdk.java.net/~tschatzl/8136681/webrev.2/ (changes)
>
>
> On Fri, 2015-11-13 at 07:58 -0800, Jon Masamitsu wrote:
>> Thomas,
>>
>> This is partial.  If you send out a second webrev based on Mikael's
>> review, I'll finish with that.
>>
>>
>> http://cr.openjdk.java.net/~tschatzl/8136681/webrev/src/share/vm/gc/g1/g1CollectedHeap.hpp.frames.html
>>
>>> 1370 // Returns the number of regions the humongous object of the
>>> given word size
>>> 1371 // covers.
>> "covers" is not quite right since to me it says that the humongous
>> object completely uses the
>> region.  I'd use "requires".
> Fixed.
>
>> http://cr.openjdk.java.net/~tschatzl/8136681/webrev/src/share/vm/gc/g1/g1IHOPControl.hpp.html
>>
>>     49   // Update information about recent time during which allocations happened,
>>     50   // how many allocations happened and an additional safety buffer.
>>
>> // Update information about
>>
>> I DON'T KNOW WHICH OF THESE IS MORE PRECISE.
>> //   Time during which allocations occurred (sum of mutator execution time + GC pause times)
>> OR
>> //   Concurrent marking time (concurrent mark end - concurrent mark start)
>>
>> //   Allocations in bytes during that time
>> //   Safety buffer ???
>>
>> I couldn't figure out what the safety buffer is supposed to be.  It seems to
>> be the young gen size but don't know why.
> I tried to explain in in the text. In short, in G1 the IHOP value is
> based on old gen occupancy only. The problem is that the young gen also
> needs to be allocated somewhere too.
>
> Now you could just say, use the maximum young gen size. However this is
> 60% of the heap by default... so the adaptive IHOP algorithm uses a
> measure of the young gen that is not bounded by G1ReservePercent.
>
> The reason to use the unbounded value is because if the code used the
> bounded one, it would cancel out with G1ReservePercent, because the
> closer we get to G1ReservePercent, the smaller that bounded value would
> get, which would make the current IHOP value rise etc, which would delay
> the initiation of the marking.
>
> That would end up loosing throughput as then the young gen gets smaller
> and smaller (and GC frequency increases), it can take a long time until
> G1 gets close enough to G1ReservePercent so that the other factors
> (allocation rate, marking time) are used.
>
> Basically initial mark will be delayed until young gen reaches its
> minimum size, at which time G1 will continue to use that young gen size
> until marking occurs. Which means that typically G1 will eat into
> G1ReservePercent, which we also do not want.
>
> Additionally it would get G1 more in trouble in regards to pause time,
> giving it less room during that time.
I think I understand the issue of using an unbounded young gen but
what precisely is meant by "measure of the young gen"?  By measure
do you mean you used the size of young gen from recent young-only
collections?

>
> Note that G1 young gen sizing already has some issues with young gen
> sizing if it operates close to G1ReservePercent. If there is time I will
> look at it again later.
>
>>     33 // Manages the decision about the threshold when concurrent marking should start.
>>
>> // IHOP = Initiating heap occupancy in percent of the entire heap
>>
>>     34 class G1IHOPControl : public CHeapObj<mtGC> {
>>     35  protected:
>>     36   double _ihop_percent;
>> // Occupancy in bytes.  This is always less than or equal to the (maximum or current maximum?) heap occupancy
>>     37   size_t _target_occupancy;
>>
>>
>> 	// Most recent complete allocation period (time from first young-only GC to
>> 	// end of marking) in seconds.  Any period which is interrupted by a full GC
>> 	// is ignored.
>>     65    _last_allocation_time_s(0.0),
>> 	// Number of bytes allocated during that period (mutator allocation + promotions)
>>     66    _last_allocated_bytes(0),
>> 	// Length of most recent marking cycle in seconds
>>     67    _last_marking_length_s(0.0) { }
>>
>> 	// Occupany in bytes at which a concurrent marking cycle is started. This count
>> 	// includes the waste in regions occupied by humongous objects and waste at
>> 	// the end of any old region that is not the currently being used for promotions
>> 	// and the waste at the end of survivor regions.
> I skipped that paragraph about which waste e.g. at end of TLABs or
> humongous regions is used, and which not. IHOP calculation needs to
> consider actual free space for allocation.

That's fine.

>
> Unfortunately G1 is in two minds about this, i.e. used() for humongous
> objects does not contain the "waste" at the end of the las region, but
> used() for regular regions does.

So occupancy is not useful and you use free regions + free in the current
old allocation regions?

>
>>     69   size_t get_conc_mark_start_threshold() { return (size_t) (_ihop_percent * _target_occupancy / 100.0); }
>>
>> This might be better named update_marking_length() but can be done in a later cleanup.
>>
> Done.
>
> Thanks a lot for all these reminders for documentation. I have been
> working too long on this functionality so that "everything is clear" to
> me :)
>
> New webrevs with hopefully more complete explanations at:
> http://cr.openjdk.java.net/~tschatzl/8136681/webrev.1_to_2/
> (incremental)
> http://cr.openjdk.java.net/~tschatzl/8136681/webrev.2/ (changes)
>
>>     	// Update marking length (concurrent marking end - concurret marking start)
>>     77   virtual void update_time_to_mixed(double marking_length_s) {
>>
>> Have you given much thought to what affect a to-space exhaustion should
>> have on
>> IHOP?  I understand that it is not the design center for this but I
>> think that to-space
>> exhaustion can confuse the statistics.   Maybe a reset of the statistics
>> and a dropping
>> the IHOP to a small value (current heap occupancy or even 0) until you
>> get 3 successful
>> marking cycles.  Or think about it later.
> I already thought a little about how this should interact with regards
> to the calculation: the idea is that basically the algorithm will notice
> that there is a significant amount of additional allocation, and will
> lower the IHOP threshold automatically. (Looking through the code I
> think I saw some problems here, I will see to fix that)

The increased allocation would be the allocation from compacting
all the live data into regions during the full GC.  That certainly should
make the IHOP drop but that seems rather weakly related to the
actual IHOP needed to avoid promotion failure.   It's hard for me
to see how that is going to scale (i.e., it seems complicated to
use something like the live data size as input to the modeling
of IHOP). I'd start with something really simple but if you're
comfortable with it, that's fine.

>
> If evacuation failure happens in a gc during marking, there are a few
> options, I have not decided on what's best:
>
> - do nothing because the user asked to run an impossible to manage
> workload (combination of live set, allocation rate, and other options).
>
>    - there is already some log output which that information can be
> derived from.
>
> - allow the user to set a maximum IHOP threshold. He could base this
> value on the log messages he gets.
>    - the user can already do that by increasing G1ReservePercent btw
>
> - make sure that marking completes in time
>
>    - let the mutator threads (or during young gcs while marking is
> running) do some marking if we notice that we do not have enough time.
> Not sure if it is worth the effort.
>
>    - fix some bugs in marking :) that prevent that in extraordinary
> conditions.
>
>    - make sure that we always start marking early enough by making sure
> that mixed gc reclaims enough memory. Planning some work here as part of
> generic work on improving gc policy.

I like this one above.

Thanks for the extra explanations.

Jon
>
>    - improve predictions: some work being done here.
>
> - make evacuation failure really fast and hope it works out
>
>    - do not immediately move young gen regions with evacuation failures
> into young gen regions, but make them survivor.
>
>    - there have already been lots of improvements here as you might have
> noticed.
>
> Thanks,
>    Thomas
>
>