RFR: 8244505: G1 pause time ratio calculation does not consider Remark/Cleanup
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. ------------- Commit messages: - 8244505: G1 pause time ratio calculation does not consider Remark/Cleanup Changes: https://git.openjdk.java.net/jdk/pull/183/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=183&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8244505 Stats: 53 lines in 4 files changed: 26 ins; 14 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/183.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/183/head:pull/183 PR: https://git.openjdk.java.net/jdk/pull/183
Hi Ivan, On 15.09.20 15:57, Ivan Walulya wrote:
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files.
Initial comments: - please remove the trivial cleanups from this change and do them in a separate "trivial cleanups" change either before or after: they really detract from the fix for the actual problem as they are too many, and they, while they are located in the g1Policy file, are completely unrelated to this patch. This includes at least the hunks in - G1Policy::revise_young_list_target_length_if_necessary (why is 110/100 better than 1100/1000?) - G1Policy::record_concurrent_mark_init_end - G1Policy::calc_min_old_cset_length - the first hunk in G1Policy::record_collection_pause_end None of these locations have anything to do with actual pause time ratio calculation. Not that we do not do bundle cleanups elsewhere, but the cleanup to actual related change ratio is quite high in this change. - I would also almost prefer if the pre-existing bug in G1Analytics::compute_pause_time_ratios() were also moved out of this change to highlight it. It does not seem to be connected to that pause time calculation did not consider Remark/Cleanup pause before. (Not completely sure why this has not been noticed in the recent young gen sizing changes). - Please rename G1Policy::update_pause_time_stats() to something more related to gc time ratio, i.e. update_actual_gc_time_ratio() or so to make it a bit more specific. (This is probably not the perfect name for it). Will keep looking.
Thanks, Thomas
On Tue, 15 Sep 2020 13:51:22 GMT, Ivan Walulya <iwalulya@openjdk.org> wrote:
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Thanks @tschatzl for the initial comments, I will split the changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/183
On Tue, 15 Sep 2020 16:48:53 GMT, Ivan Walulya <iwalulya@openjdk.org> wrote:
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Thanks @tschatzl for the initial comments, I will split the changes.
- I'm worried that the assignment of "update_stats" in G1Policy::record_collection_pause_end() and G1Policy::record_pause() would get out of sync. It would be nice if the change either passed it to G1Policy::record_pause(), or create a new member function that returns that and is used in both places.
Will create a helper function to sync usage of 'update_stats'
- update_gc_pause_time_ratios() does not need to be public afaict
Right, this will be change
- I'm okay that now full gcs are considered in the pause time ratio calculation. This may cause (with JDK-8238687) that the heap will be kept larger after the full gc, which okay with me for naturally occurring full gcs.
One concern I have are full gcs caused by System.gc() calls - one of their purposes is to compact the heap, but now they will have impact beyond that one. Maybe try to exclude those from the gc time ratio calculation?
I think the previous (undocumented) exclusion of all full gcs has been to not overcomplicate the code assuming that most if any full gcs with G1 should be System.gcs() for heap compaction purposes.
Ivan, what do you think?
Right, Full-gcs were excluded from the compute_pause_time_ratios immediately after the full-gc, but not from the successive computations. Bug/Enhancement will be filed.
- I think the change in G1Analytics::compute_pause_time_ratios() is
double long_interval_ms = (end_time_sec - oldest_known_gc_end_time_sec()) * 1000.0; - _long_term_pause_time_ratio = _recent_gc_times_ms->sum() / long_interval_ms; + _long_term_pause_time_ratio = (_recent_gc_times_ms->sum() + pause_time_ms) / long_interval_ms;
wrong, or even worse than the current calculation. Sorry for not being able to pinpoint it yesterday evening.
Let's assume that we have the following pauses on the timeline in G1Analytics, where "X" means pause (the number at the end of an "X" block is just a running number), "-" means mutator time.
XXX1-----XX2---XXX3---------XXXX4---XXXXXX5--------XX6
^--------------long_interval_ms-------------------^
(optimized for fixed font view)
Currently oldest_known_gc_end_time_sec() is at the end of pause 1, and _recent_gc_times_ms contains all gc pause times from 1 to 5. We did not call update_recent_gc_times() yet.
Now the (existing) calculation calculates "long_interval_ms" from the end of pause 1 to the end of pause 6 (current), and divides by the sum of pauses 1 to 5 - which is wrong as pause 1 can be different to pause 6 in length. However, there is the the code permeating assumption (not documented, sorry :)) that pauses are fairly regular. So this somewhat works out.
With that change, the dividend is too large, i.e. in addition to taking the wrong long_interval_ms, the code now also contains additional pauses.
Maybe this could be fixed somehow? Unless it's a very small fix (or I'm wrong about the above), I would prefer to fix this separately.
Right, I will revert to the previous version, and file a bug to fix this separately.
Thanks, Thomas
------------- PR: https://git.openjdk.java.net/jdk/pull/183
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Removed all changes unrelated to the main problem solved by the patch ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/183/files - new: https://git.openjdk.java.net/jdk/pull/183/files/3126890f..2e6179a7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=183&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=183&range=00-01 Stats: 18 lines in 3 files changed: 4 ins; 1 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/183.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/183/head:pull/183 PR: https://git.openjdk.java.net/jdk/pull/183
Hi Ivan, On 15.09.20 19:29, Ivan Walulya wrote:
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
Removed all changes unrelated to the main problem solved by the patch
Thanks. - I'm worried that the assignment of "update_stats" in G1Policy::record_collection_pause_end() and G1Policy::record_pause() would get out of sync. It would be nice if the change either passed it to G1Policy::record_pause(), or create a new member function that returns that and is used in both places. - update_gc_pause_time_ratios() does not need to be public afaict - I'm okay that now full gcs are considered in the pause time ratio calculation. This may cause (with JDK-8238687) that the heap will be kept larger after the full gc, which okay with me for naturally occurring full gcs. One concern I have are full gcs caused by System.gc() calls - one of their purposes is to compact the heap, but now they will have impact beyond that one. Maybe try to exclude those from the gc time ratio calculation? I think the previous (undocumented) exclusion of all full gcs has been to not overcomplicate the code assuming that most if any full gcs with G1 should be System.gcs() for heap compaction purposes. Ivan, what do you think? - I think the change in G1Analytics::compute_pause_time_ratios() is double long_interval_ms = (end_time_sec - oldest_known_gc_end_time_sec()) * 1000.0; - _long_term_pause_time_ratio = _recent_gc_times_ms->sum() / long_interval_ms; + _long_term_pause_time_ratio = (_recent_gc_times_ms->sum() + pause_time_ms) / long_interval_ms; wrong, or even worse than the current calculation. Sorry for not being able to pinpoint it yesterday evening. Let's assume that we have the following pauses on the timeline in G1Analytics, where "X" means pause (the number at the end of an "X" block is just a running number), "-" means mutator time. XXX1-----XX2---XXX3---------XXXX4---XXXXXX5--------XX6 ^--------------long_interval_ms-------------------^ (optimized for fixed font view) Currently oldest_known_gc_end_time_sec() is at the end of pause 1, and _recent_gc_times_ms contains all gc pause times from 1 to 5. We did not call update_recent_gc_times() yet. Now the (existing) calculation calculates "long_interval_ms" from the end of pause 1 to the end of pause 6 (current), and divides by the sum of pauses 1 to 5 - which is wrong as pause 1 can be different to pause 6 in length. However, there is the the code permeating assumption (not documented, sorry :)) that pauses are fairly regular. So this somewhat works out. With that change, the dividend is too large, i.e. in addition to taking the wrong long_interval_ms, the code now also contains additional pauses. Maybe this could be fixed somehow? Unless it's a very small fix (or I'm wrong about the above), I would prefer to fix this separately. Thanks, Thomas
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Changes made to address reviews from Thomas. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/183/files - new: https://git.openjdk.java.net/jdk/pull/183/files/2e6179a7..75f87d9a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=183&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=183&range=01-02 Stats: 21 lines in 3 files changed: 13 ins; 2 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/183.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/183/head:pull/183 PR: https://git.openjdk.java.net/jdk/pull/183
Hi, On 16.09.20 15:17, Ivan Walulya wrote:
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
Changes made to address reviews from Thomas.
-------------
The comment at the declaration of G1Policy::should_update_gc_stats() is better placed at the implementation as it is imho. You could move it there if you want. Also thanks for catching the issue with "*1000*0", I forgot to mention it earlier. Lgtm. Thanks. Thomas
On Wed, 16 Sep 2020 13:17:26 GMT, Ivan Walulya <iwalulya@openjdk.org> wrote:
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
Changes made to address reviews from Thomas.
Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/183
On Wed, 16 Sep 2020 13:43:39 GMT, Thomas Schatzl <tschatzl@openjdk.org> wrote:
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
Changes made to address reviews from Thomas.
Marked as reviewed by tschatzl (Reviewer).
_Mailing list message from [Thomas Schatzl](mailto:thomas.schatzl@oracle.com) on [hotspot-gc-dev](mailto:hotspot-gc-dev@openjdk.java.net):_ Hi,
On 16.09.20 15:17, Ivan Walulya wrote:
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Changes made to address reviews from Thomas. -------------
The comment at the declaration of G1Policy::should_update_gc_stats() is better placed at the implementation as it is imho.
You could move it there if you want.
I agree, will move it.
Also thanks for catching the issue with "*1000*0", I forgot to mention it earlier.
Lgtm. Thanks.
Thomas
Thanks Thomas, ------------- PR: https://git.openjdk.java.net/jdk/pull/183
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Minor update to change location of method comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/183/files - new: https://git.openjdk.java.net/jdk/pull/183/files/75f87d9a..831e89c1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=183&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=183&range=02-03 Stats: 4 lines in 2 files changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/183.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/183/head:pull/183 PR: https://git.openjdk.java.net/jdk/pull/183
On Wed, 16 Sep 2020 13:53:42 GMT, Ivan Walulya <iwalulya@openjdk.org> wrote:
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
Minor update to change location of method comment
Looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/183
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Instruction reordering the eliminate bug introduced by refactoring ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/183/files - new: https://git.openjdk.java.net/jdk/pull/183/files/831e89c1..e1f963e5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=183&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=183&range=03-04 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/183.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/183/head:pull/183 PR: https://git.openjdk.java.net/jdk/pull/183
On Fri, 18 Sep 2020 06:59:37 GMT, Ivan Walulya <iwalulya@openjdk.org> wrote:
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
Instruction reordering the eliminate bug introduced by refactoring
Marked as reviewed by sjohanss (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/183
On Fri, 18 Sep 2020 06:59:37 GMT, Ivan Walulya <iwalulya@openjdk.org> wrote:
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
Instruction reordering the eliminate bug introduced by refactoring
Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/183
On Fri, 18 Sep 2020 08:50:01 GMT, Thomas Schatzl <tschatzl@openjdk.org> wrote:
Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
Instruction reordering the eliminate bug introduced by refactoring
Marked as reviewed by tschatzl (Reviewer).
Thanks @kstefanj and @tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/183
On Tue, 15 Sep 2020 13:51:22 GMT, Ivan Walulya <iwalulya@openjdk.org> wrote:
The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7 Performance testing did not show any significant changes in performance
This pull request has now been integrated. Changeset: 86a16400 Author: Ivan Walulya <iwalulya@openjdk.org> Committer: Thomas Schatzl <tschatzl@openjdk.org> URL: https://git.openjdk.java.net/jdk/commit/86a16400 Stats: 51 lines in 4 files changed: 11 ins; 37 del; 3 mod 8244505: G1 pause time ratio calculation does not consider Remark/Cleanup pauses Reviewed-by: tschatzl, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/183
participants (4)
-
Ivan Walulya
-
Stefan Johansson
-
Thomas Schatzl
-
Thomas Schatzl