Re: Discussion for 8226197: Reducing G1’s CPU cost with simplified write post-barrier and disabling concurrent refinement

Wed Jun 19 01:15:46 UTC 2019

Thanks for both responses!
I will certainly take a look at JDK-8213108, and will rebase our patch on
top of JDK-8213108. Hopefully it will make our patch smaller.

Regarding to whether to support G1-throughput mode in the long term, could
we set up a video conference meeting to chat about it?
Below are some reasoning on why we would want to have the throughput mode.

We have been thinking about something like a throughput-mode even before
the idea of simplified write barrier.
For example, for some throughput-oriented workload with very large heap,
repeatedly moving old-gen objects in mixed collections could be costly.
We found that setting "-XX:InitiatingHeapOccupancyPercent=100
-XX:-G1UseAdaptiveIHOP" to disable concurrent and mixed collections is
quite helpful for that case.
The simplified write barrier is a different direction to improve
throughput, and user could still have the option to keep concurrent and
mixed collections enabled.
These different approaches could surely be presented in one JEP for the
throughput mode.

One argument against this is that if what one cares about is
> throughput, then we already have a throughput-oriented collector
> (ParallelGC), and one should just use that. G1-throughput mode isn't
> useful for a pure throughput use-case unless it can beat ParallelGC.
> There might be some cases where that happens, but it's not clear how
> common that is.
> One knock against ParallelGC is that the latency can be *really* bad.
> Even if G1-throughput mode doesn't beat ParallelGC for throughput,
> there may be mostly throughput-oriented applications that are somewhat
> sensitive to latency, and such could benefit from G1-throughput mode.
> But we're not sure how common such applications really are.

It will save us a lot of maintenance work if we only need to support one
garbage collector.
On the other hand, G1 is supposedly an all-around collector that can be
tuned for either pause time or throughput. It will be good to make G1
perform well when it is tuned for throughput.

For most of our workloads that are highly tuned for CMS, they trigger
young-gen collections frequently, but rarely any concurrently collections.
Arguably ParallelGC might also work for these workload, but as you
mentioned, the tail latency could be really bad.
For these workloads, the concurrently collections in CMS is more like a
safety net, that collects any garbage infrequently spilled into the old-gen.

For these workloads, migrating them to G1 in JDK11 does not show much
reduction in latency, but significant increase in CPU usage, which directly
translates a big drop in queries-per-second (because CPU quota is the same).
The simplified write-barrier approach will certainly help these cases by
reducing CPU usage. And since the old-gen is lightly used, scanning all
cards for the used part of the old-gen during a pause would not be
prohibitively expensive.

So far, we've ended up not pursuing this course and instead focusing
> our efforts on narrowing the space between G1 and ParallelGC, mostly
> by improving G1's throughput performance and by better ergonomics and
> tuning guidance. Thomas may have some data on what progress has been
> made, and there are lots of good ideas left to pursue. (JDK-8220465
> would help on the ParallelGC side; unfortunately, nobody from Oracle
> has had time to devote to it.) The idea is to narrow the application
> space between G1 and Parallel where G1-throughput mode naturally lives.

Without changing how the write barrier works, I'm skeptical if we can
recover most of the increase in CPU usage for those workloads that mostly
trigger young-gen collections.
As mentioned above, the issue is actually more about CPU usage than typical
definition of throughput (i.e. wall time to finish a fixed amount of work),
due to how containerized environment and load balancing works. More CPU
usage typically means the machine will receive less work.

-Man

On Mon, Jun 17, 2019 at 5:52 PM Kim Barrett <kim.barrett at oracle.com> wrote:

> > On Jun 14, 2019, at 9:41 PM, Man Cao <manc at google.com> wrote:
> >
> > Hi all,
> >
> > I'd like to discuss the feasibility of supporting a new mode of G1 that
> uses a simplified write post-barrier. The idea is basically trading off
> some pause time with CPU time, more details are in:
> > https://bugs.openjdk.java.net/browse/JDK-8226197
> >
> > A prototype implementation is here:
> > https://cr.openjdk.java.net/~manc/8226197/webrev.00/
> >
> > At a high level, other than the maintenance issue of supporting two
> different types of write barrier, is there anything inherently wrong about
> this approach? I have run fastdebug build with various GC verification
> options turned on to stress test the prototype, and so far I have not found
> any failures due to the prototype.
> >
> > For the patch itself, besides the changes to files related to the write
> barrier, the majority of the change is in g1RemSet.cpp, where it needs to
> scan all dirty cards for regions not in the collection set. This phase
> (called process_card_table()) replaces the update_rem_set() phase during
> evacuation pause, and is similar to
> ClearNoncleanCardWrapper::do_MemRegion() for CMS.
> > There are certainly many things can be improved for the patch, e.g.,
> G1Analytics should take into account and adapt to the time spent in
> process_card_table(), and process_card_table() could be further optimized
> to speed up the scanning. We'd like to discuss about this approach in
> general before further improving it.
> > In addition, we will collect more performance results with real
> production workload in a week or two.
> >
> > -Man
>
> As Thomas said, we (the Oracle GC team) have been considering
> something like this (off and on) for some time; it has come up for
> discussion several times. So far, we haven't pursued the idea to
> completion / integration, and there are a number of reasons for this.
>
> Fundamentally the idea is to improve G1's throughput performance at
> the (potential) expense of its latency behavior.   Let's call this
> G1-throughput mode below.
>
> One argument against this is that if what one cares about is
> throughput, then we already have a throughput-oriented collector
> (ParallelGC), and one should just use that. G1-throughput mode isn't
> useful for a pure throughput use-case unless it can beat ParallelGC.
> There might be some cases where that happens, but it's not clear how
> common that is.
>
> One knock against ParallelGC is that the latency can be *really* bad.
> Even if G1-throughput mode doesn't beat ParallelGC for throughput,
> there may be mostly throughput-oriented applications that are somewhat
> sensitive to latency, and such could benefit from G1-throughput mode.
> But we're not sure how common such applications really are.
>
> The cost of adding G1-throughput mode should not be discounted. "other
> than the maintenance issue of supporting two types of write barrier"
> kind of trivializes that cost. From a testing point of view, it's
> pretty close to being a whole new collector, likely requiring running
> a large number of tests in G1-throughput mode. We think the additional
> testing cost and potential bug tail are a substantial downside.
>
> So far, we've ended up not pursuing this course and instead focusing
> our efforts on narrowing the space between G1 and ParallelGC, mostly
> by improving G1's throughput performance and by better ergonomics and
> tuning guidance. Thomas may have some data on what progress has been
> made, and there are lots of good ideas left to pursue. (JDK-8220465
> would help on the ParallelGC side; unfortunately, nobody from Oracle
> has had time to devote to it.) The idea is to narrow the application
> space between G1 and Parallel where G1-throughput mode naturally lives.
>
> (I've only given the proposed changeset a cursory skim so far. I'm
> waiting for discussion on the high-level question of whether this is a
> direction that will be pursued.)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20190618/2c2a21da/attachment.htm>