RFR: 8278824: Add an option for G1 to configure chunks per region [v3]

Thomas Schatzl tschatzl at openjdk.java.net
Wed Dec 15 16:10:55 UTC 2021


On Tue, 14 Dec 2021 23:45:37 GMT, William Kemper <wkemper at openjdk.org> wrote:

>> During an upgrade from JDK11 to JDK15 (and JDK17) we experience an increase in garbage collection latency from approximately 10-15ms at p99.9 to approximately 45-50ms. Logs showed the object copy phase consuming most of that time.
>> 
>> Enabling task queue stats on the build and increasing the log level showed that one evacuation thread is doing far more work than the other threads.
>> 
>> 
>> 148905 [2021-12-06T22:12:38.338+0000][debug][gc,phases ] GC(581) Object Copy (ms): Min: 13.3, Avg: 16.6, Max: 67.2, Diff: 53.8, Sum: 546.7, Workers: 33
>> 148906 [2021-12-06T22:12:38.338+0000][trace][gc,phases,task ] GC(581) 14.8 15.4 15.3 16.0 14.7 14.6 16.6 15.3 16.3 15.2 67.2 13.8 14.1 16.5 14.5 14.3 14.1 14.9 15.3 15.8 13.3 15.6 15.0 15.6 14.7 15.6 14.6 14.3 15.1 15.2 14.4 13.5 14.9
>> 
>> 
>> Looking back, we see a pattern in these cycles where the thread doing most of the work scans only a 2 or 3 "blocks" (here, just 2):
>> 
>> 148899 [2021-12-06T22:12:38.338+0000][debug][gc,phases ] GC(581) Scanned Blocks: Min: 2, Avg: 1203.9, Max: 1875, Diff: 1873, Sum: 39730, Workers: 33
>> 148900 [2021-12-06T22:12:38.338+0000][trace][gc,phases,task ] GC(581) 949 1838 1397 1548 1450 1875 821 1067 1312 1463 2 1282 1319 38 1177 1199 819 1170 897 1343 1860 1070 1059 1552 1217 1296 1068 1092 1645 1166 1002 1140 1597
>> 
>> 
>> And in the task queue stats this thread performs an order of magnitude more operations on the task queue:
>> 
>> 149005 [2021-12-06T22:12:38.339+0000][trace][gc,task,stats ] GC(581) 8 51757 51143 9173 121241 9379 0 0
>> 149006 [2021-12-06T22:12:38.339+0000][trace][gc,task,stats ] GC(581) 9 37328 36045 6291 113112 8332 0 0
>> 149007 [2021-12-06T22:12:38.339+0000][trace][gc,task,stats ] GC(581) 10 350079 77644 0 132 0 267666 226166
>> 149008 [2021-12-06T22:12:38.339+0000][trace][gc,task,stats ] GC(581) 11 38889 37932 7100 110650 7940 0 0
>> 149009 [2021-12-06T22:12:38.339+0000][trace][gc,task,stats ] GC(581) 12 53762 53174 9375 110255 8637 0 0
>> 
>> 
>> We traced the origin of this behavior to: https://bugs.openjdk.java.net/browse/JDK-8213108
>> 
>> We tried changing `ParGCArrayScanChunk`, but to no avail. Introducing a flag to override the ergonomics that select chunks per region was effective.
>
> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove whitespace from empty line

Thanks for reporting this issue - nice find!

This is as you correctly noted an issue with work distribution during the Object Copy phase. There are known issues with work stealing that we've been working on specifically in the last few weeks; the graph mentioned below shows the current results (fwiw, it also gives good results without this change).

So this change improves upon by making the initial distribution of work better, which so far seems a good solution for this particular case.

After reproducing this particular issue and internal discussion we think that adding a new flag is something to avoid at all here. The main reason is that improving the defaults for the number of scan chunks seems side-effect free so far - tests so far do not show a regression either, and the additional memory usage (and the effort to manage this scan chunk table memory) seems negligible.

The graph attached [to the CR](https://bugs.openjdk.java.net/secure/attachment/97385/20211215-repo-task-queue-pause-times-with-fix2.png) (github does not allow me to attach it for some reason) shows pause times for your reproducer on jdk11, jdk17, and jdk17 with values of `G1RemSetScanChunksPerRegion` from 64 to 1024; the selected 256 seems to work very well for this case :) and that `improved-steal-multi` that shows a recent prototype for the mentioned task queue changes (which covers more cases than this one).

So our suggestion is to, for the 16m regions, set the default number of chunks per region to 128 or 256, depending on further testing results, for JDK 18 (and then use this to backport to 17). When we are ready to post the task queue changes (probably JDK19), we might want to reconsider these defaults again.

Would that be an acceptable plan for you?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6840



More information about the hotspot-gc-dev mailing list