RFR: 8244505: G1 pause time ratio calculation does not consider Remark/Cleanup pauses [v2]

Thomas Schatzl thomas.schatzl at oracle.com
Wed Sep 16 09:12:11 UTC 2020


Hi Ivan,

On 15.09.20 19:29, Ivan Walulya wrote:
>> The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates
>> to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup)
>> recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7
>> Performance testing did not show any significant changes in performance
> 
> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
> 
>    Removed all changes unrelated to the main problem solved by the patch

Thanks.

- I'm worried that the assignment of "update_stats" in 
G1Policy::record_collection_pause_end() and G1Policy::record_pause() 
would get out of sync. It would be nice if the change either passed it 
to G1Policy::record_pause(), or create a new member function that 
returns that and is used in both places.

- update_gc_pause_time_ratios() does not need to be public afaict

- I'm okay that now full gcs are considered in the pause time ratio 
calculation.
This may cause (with JDK-8238687) that the heap will be kept larger 
after the full gc, which okay with me for naturally occurring full gcs.

One concern I have are full gcs caused by System.gc() calls - one of 
their purposes is to compact the heap, but now they will have impact 
beyond that one. Maybe try to exclude those from the gc time ratio 
calculation?

I think the previous (undocumented) exclusion of all full gcs has been 
to not overcomplicate the code assuming that most if any full gcs with 
G1 should be System.gcs() for heap compaction purposes.

Ivan, what do you think?

- I think the change in G1Analytics::compute_pause_time_ratios() is

    double long_interval_ms = (end_time_sec - 
oldest_known_gc_end_time_sec()) * 1000.0;
-  _long_term_pause_time_ratio = _recent_gc_times_ms->sum() / 
long_interval_ms;
+  _long_term_pause_time_ratio = (_recent_gc_times_ms->sum() + 
pause_time_ms) / long_interval_ms;

wrong, or even worse than the current calculation. Sorry for not being 
able to pinpoint it yesterday evening.

Let's assume that we have the following pauses on the timeline in 
G1Analytics, where "X" means pause (the number at the end of an "X" 
block is just a running number), "-" means mutator time.


XXX1-----XX2---XXX3---------XXXX4---XXXXXX5--------XX6

     ^--------------long_interval_ms-------------------^

(optimized for fixed font view)

Currently oldest_known_gc_end_time_sec() is at the end of pause 1, and 
_recent_gc_times_ms contains all gc pause times from 1 to 5. We did not 
call update_recent_gc_times() yet.

Now the (existing) calculation calculates "long_interval_ms" from the 
end of pause 1 to the end of pause 6 (current), and divides by the sum 
of pauses 1 to 5 - which is wrong as pause 1 can be different to pause 6 
in length. However, there is the the code permeating assumption (not 
documented, sorry :)) that pauses are fairly regular. So this somewhat 
works out.

With that change, the dividend is too large, i.e. in addition to taking 
the wrong long_interval_ms, the code now also contains additional pauses.

Maybe this could be fixed somehow? Unless it's a very small fix (or I'm 
wrong about the above), I would prefer to fix this separately.

Thanks,
   Thomas



More information about the hotspot-gc-dev mailing list