RFR: 8244505: G1 pause time ratio calculation does not consider Remark/Cleanup pauses [v2]
Thomas Schatzl
thomas.schatzl at oracle.com
Wed Sep 16 09:12:11 UTC 2020
Hi Ivan,
On 15.09.20 19:29, Ivan Walulya wrote:
>> The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates
>> to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup)
>> recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7
>> Performance testing did not show any significant changes in performance
>
> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
>
> Removed all changes unrelated to the main problem solved by the patch
Thanks.
- I'm worried that the assignment of "update_stats" in
G1Policy::record_collection_pause_end() and G1Policy::record_pause()
would get out of sync. It would be nice if the change either passed it
to G1Policy::record_pause(), or create a new member function that
returns that and is used in both places.
- update_gc_pause_time_ratios() does not need to be public afaict
- I'm okay that now full gcs are considered in the pause time ratio
calculation.
This may cause (with JDK-8238687) that the heap will be kept larger
after the full gc, which okay with me for naturally occurring full gcs.
One concern I have are full gcs caused by System.gc() calls - one of
their purposes is to compact the heap, but now they will have impact
beyond that one. Maybe try to exclude those from the gc time ratio
calculation?
I think the previous (undocumented) exclusion of all full gcs has been
to not overcomplicate the code assuming that most if any full gcs with
G1 should be System.gcs() for heap compaction purposes.
Ivan, what do you think?
- I think the change in G1Analytics::compute_pause_time_ratios() is
double long_interval_ms = (end_time_sec -
oldest_known_gc_end_time_sec()) * 1000.0;
- _long_term_pause_time_ratio = _recent_gc_times_ms->sum() /
long_interval_ms;
+ _long_term_pause_time_ratio = (_recent_gc_times_ms->sum() +
pause_time_ms) / long_interval_ms;
wrong, or even worse than the current calculation. Sorry for not being
able to pinpoint it yesterday evening.
Let's assume that we have the following pauses on the timeline in
G1Analytics, where "X" means pause (the number at the end of an "X"
block is just a running number), "-" means mutator time.
XXX1-----XX2---XXX3---------XXXX4---XXXXXX5--------XX6
^--------------long_interval_ms-------------------^
(optimized for fixed font view)
Currently oldest_known_gc_end_time_sec() is at the end of pause 1, and
_recent_gc_times_ms contains all gc pause times from 1 to 5. We did not
call update_recent_gc_times() yet.
Now the (existing) calculation calculates "long_interval_ms" from the
end of pause 1 to the end of pause 6 (current), and divides by the sum
of pauses 1 to 5 - which is wrong as pause 1 can be different to pause 6
in length. However, there is the the code permeating assumption (not
documented, sorry :)) that pauses are fairly regular. So this somewhat
works out.
With that change, the dividend is too large, i.e. in addition to taking
the wrong long_interval_ms, the code now also contains additional pauses.
Maybe this could be fixed somehow? Unless it's a very small fix (or I'm
wrong about the above), I would prefer to fix this separately.
Thanks,
Thomas
More information about the hotspot-gc-dev
mailing list