JEP 248: Make G1 the Default Garbage Collector

Tue Jun 2 01:09:30 UTC 2015

I think Jeremy alluded to it: make sure there's very little promotion by
tuning CMS such that things don't get aged into promotion.  I'm assuming
the unspoken aspect is also: don't allocate like a maniac, which is good
for other reasons as well of course.

sent from my phone
On Jun 1, 2015 9:01 PM, "Erik Österlund" <erik.osterlund at lnu.se> wrote:

> Hi Jeremy,
>
> Are you suggesting making Google’s CMS the new default instead?
> The target for this is long running server applications where
> fragmentation issues become increasingly awkward over time. Literature
> suggests fragmentation overheads can be as bad as allocations costing 1/2
> log(n) as much memory due to fragmentation, where n is the ratio of the
> smallest and largest allocatable objects. In short… ouch! This can make the
> JVM run out of memory and crash, which is suboptimal.
> So I’m curious - what’s the Google solution to fragmentation using CMS?
> Let me guess… buy more memory? :p
>
> Cheers,
> /Erik
>
> Från: Jeremy Manson <jeremymanson at google.com<mailto:
> jeremymanson at google.com>>
> Datum: Tuesday 2 June 2015 01:04
> Till: charlie hunt <charlie.hunt at oracle.com<mailto:charlie.hunt at oracle.com
> >>
> Kopia: Erik Österlund <erik.osterlund at lnu.se<mailto:erik.osterlund at lnu.se>>,
> "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net> Source
> Developers" <hotspot-dev at openjdk.java.net<mailto:
> hotspot-dev at openjdk.java.net>>
> Ämne: Re: JEP 248: Make G1 the Default Garbage Collector
>
> This is a very interesting conversation, and I'd like to throw in Google's
> not-quite-relevant observations, too.   We're not quite relevant because
> we'll just change the default to whatever we want, and because we ignore
> the parallel GC, for the most part.  Much of it has been said before in
> this thread, but it's worth reiterating.
>
> - As it happens, we're considering changing the default, too - to the CMS
> collector!
>
> - We've been asking people to try G1 approximately annually for the last
> five or so years.
>
> - Although G1 performance has gotten better over time, we still find that
> the additional CPU costs outweigh any benefits, and that CMS typically ends
> up ahead.
>
> - With most of our more important workloads, our admins tend to tune
> applications carefully so that the OG doesn't fill up too much, and CMS
> cycles are not triggered often.  This is deeply important in server-style
> applications - you want anything generated and used by a single request to
> go away with a YG collection.
>
> - Part of the reason our CMS performs better is that we've made changes to
> CMS to improve its performance.  We see roughly half as much CPU
> utilization, and the long tail pause times have been cut dramatically.  It
> would be nice if we could upstream the changes (especially for us, because
> they break with every new release, and we have to fix them!), but we've
> never been able to find anyone at Oracle who has the bandwidth to do the
> reviews.  For example, the response to this:
>
> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2015-May/013426.html
>
> was not hugely enthusiastic, and that's not even one of the more complex
> changes.
>
> Jeremy
>
> On Mon, Jun 1, 2015 at 1:51 PM, charlie hunt <charlie.hunt at oracle.com
> <mailto:charlie.hunt at oracle.com>> wrote:
> Hi Erik,
>
> HotSpot does some of this ergonomics today for both GC and JIT compiler in
> cases where the JVM sees less than 2 GB of RAM and the OS it is running on.
> These decisions are based on what is called a “server class machine”.  A
> “server class machine” as of JDK 6u18 is defined as a system that has 2 GB
> or more of RAM, two or more hardware threads. There are other cases for a
> given hardware platform, and if it is a 32-bit JVM, the collector (and JIT
> compiler) ergonomically selected may also differ from other configurations.
>
> AFAIK, the JEP is proposing to change the default GC in configurations
> where the default GC is Parallel GC to using G1 as the default.
>
> The challenge with what you are describing is that the best GC cannot
> always be ergonomically selected by the JVM without some input from the
> user, i.e. GC doesn’t know if any GC pauses greater than 200 ms are
> acceptable regardless of Java heap size, number of hardware threads, etc.
>
> thanks,
>
> charlie
>
> > On Jun 1, 2015, at 2:53 PM, Erik Österlund <erik.osterlund at lnu.se
> <mailto:erik.osterlund at lnu.se>> wrote:
> >
> > Hi all,
> >
> > Does there have to be a single default one-size-fits-all GC algorithm for
> > users to rely on? Or could we allow multiple algorithms and explicitly
> > document that unless a GC is picked, the runtime is free to pick whatever
> > it believes is better? This could have multiple benefits.
> >
> > 1. This could make such a similar change easier in the future as everyone
> > will already be aware that if they really rely on the properties of a
> > specific GC algorithm, then they should choose that GC explicitly and not
> > rely on defaults not changing; there are no guarantees that defaults will
> > not change.
> >
> > 2. Obviously there has been a long discussion in this thread which GC is
> > better in which context, and it seems like right now one size does not
> fit
> > all. The user that relied on the defaults might not be so aware of these
> > specifics. Therefore we might do them a big favour of attempting to make
> a
> > guess for them to work out-of-the-box, which is pretty neat.
> >
> > 3. This approach allows deploying G1 not everywhere, but where we guess
> it
> > performs pretty well. This means it will run in fewer JVM contexts and
> > hence pose less risk than deploying it to be used for all contexts,
> making
> > the transition smoother.
> >
> > One idea could be to first determine valid GC variants given the supplied
> > flags (GC-specific flags imply use of that GC), and then among the valid
> > GCs left, ³guess² which algorithm is better based on the other general
> > parameters, such as e.g. heap size (and maybe target latency)? Could for
> > instance pick ParallelGC for small heaps, G1 for larger heaps and CMS for
> > ridiculously large heaps or cases when extremely low latency is wanted?
> >
> > My reasoning is based on two assumptions: 1) changing the defaults would
> > target the users that don¹t know what¹s best for them, 2) one size does
> > not fit all. If these assumption are wrong, then this is a bad idea.
> >
> > Thanks,
> > /Erik
> >
> >
> >
> > Den 01/06/15 20:53 skrev charlie hunt <charlie.hunt at oracle.com<mailto:
> charlie.hunt at oracle.com>>:
> >
> >> Hi Jenny,
> >>
> >> A couple questions and comments below.
> >>
> >> thanks,
> >>
> >> charlie
> >>
> >>> On Jun 1, 2015, at 1:28 PM, Yu Zhang <yu.zhang at oracle.com<mailto:
> yu.zhang at oracle.com>> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I have done some performance comparison g1/cms/parallelgc internally at
> >>> Oracle.  I would like to post my observations here to get some
> feedback,
> >>> as I have limited benchmarks and hardware.  These are out of box
> >>> performance.
> >>>
> >>> Memory footprint/startup:
> >>> g1 has bigger memory footprint and longer start up time. The overhead
> >>> comes from more gc threads, and internal data structures to keep track
> >>> of remember set.
> >>
> >> This is the memory footprint of the JVM itself when using the same size
> >> Java heap, right?
> >>
> >> I don¹t recall if it has been your observation?  One observation I have
> >> had with G1 is that it tends to be able to operate within tolerable
> >> throughput and latency with a smaller Java heap than with Parallel GC.
> I
> >> have seen cases where G1 may not use the entire Java heap because it was
> >> able to keep enough free regions available yet still meet pause time
> >> goals. But, Parallel GC always use the entire Java heap, and once its
> >> occupancy reach capacity, it would GC. So they are cases where between
> >> the JVM¹s footprint overhead, and taking into account the amount of Java
> >> heap required, G1 may actually require less memory.
> >>
> >>>
> >>> g1 vs parallelgc:
> >>> If the workload involves young gc only, g1 could be slightly slower.
> >>> Also g1 can consume more cpu, which might slow down the benchmark if
> SUT
> >>> is cpu saturated.
> >>>
> >>> If there are promotions from young to old gen and leads to full gc with
> >>> parallelgc, for smaller heap, parallel full gc can finish within some
> >>> range of pause time, still out performs g1.  But for bigger heap, g1
> >>> mixed gc can clean the heap with pause times a fraction of parallel
> full
> >>> gc time, so improve both throughput and response time.  Extreme cases
> >>> are big data workloads(for example ycsb) with 100g heap.
> >>
> >> I think what you are saying here is that it looks like if one can tune
> >> Parallel GC such that you can avoid a lengthy collection of old
> >> generation, or the live occupancy of old gen is small enough that the
> >> time to collect is small enough to be tolerated, then Parallel GC will
> >> offer a better experience.
> >>
> >> However, if the live data in old generation at the time of its
> collection
> >> is large enough such that the time it takes to collect it exceeds a
> >> tolerable pause time, then G1 will offer a better experience.
> >>
> >> Would also say that G1 offers a better experience in the presences of
> >> (wide) swings in object allocation rates since there would likely be a
> >> larger number of promotions during the allocation spikes?  In other
> >> words, G1 may offer more predictable pauses.
> >>
> >>>
> >>> g1 vs cms:
> >>> I will focus on response time type of workloads.
> >>> Ben mentioned
> >>>
> >>> "Having said that, there is definitely a decent-sized class of systems
> >>> (not just in finance) that cannot really tolerate any more than about
> >>> 10-15ms of STW. So, what usually happens is that they live with the
> >>> young collections, use CMS and tune out the CMFs as best they can (by
> >>> clustering, rolling restart, etc, etc). I don't see any possibility of
> >>> G1 becoming a viable solution for those systems any time soon."
> >>>
> >>> Can you give more details, like what is the live data set size, how big
> >>> is the heap, etc?  I did some cache tests (Oracle coherence) to compare
> >>> cms vs g1. g1 is better than cms when there are fragmentations. If you
> >>> tune cms well to have little fragmentation, then g1 is behind cms.  But
> >>> for those cases, they have to tune CMS very well, changing default to
> g1
> >>> won't impact them.
> >>>
> >>> For big data kind of workloads (ycsb, spark in memory computing), g1 is
> >>> much better than cms.
> >>>
> >>> Thanks,
> >>> Jenny
> >>>
> >>> On 6/1/2015 10:06 AM, Ben Evans wrote:
> >>>> Hi Vitaly,
> >>>>
> >>>>>> Instead, G1 is now being talked of as a replacement for the default
> >>>>>> collector. If that's the case, then I think we need to acknowledge
> >>>>>> it,
> >>>>>> and have a conversation about where G1 is actually supposed to be
> >>>>>> used. Are we saying we want a "reasonably high throughput with
> >>>>>> reduced
> >>>>>> STW, but not low pause time" collector? If we are, that's fine, but
> >>>>>> that's not where we started.
> >>>>> That's a fair point, and one I'd be interesting in hearing an answer
> >>>>> to as
> >>>>> well.  FWIW, the only GC I know of that's actually used in low
> latency
> >>>>> systems is Azul's C4, so I'm not even sure Oracle is trying to target
> >>>>> the
> >>>>> same use cases.  So when we talk about "low latency" GCs, we should
> >>>>> probably
> >>>>> also be clear on what "low" actually means.
> >>>> Well, when I started playing with them, "low latency" meant a
> >>>> sub-10-ms transaction time with 100ms STW as acceptable, if not ideal.
> >>>>
> >>>> These days, the same sort of system needs a sub 500us transaction
> >>>> time, and ideally no GC pause at all. But that leads to Zing, or
> >>>> non-JVM solutions, and I think takes us too far into a specialised use
> >>>> case.
> >>>>
> >>>> Having said that, there is definitely a decent-sized class of systems
> >>>> (not just in finance) that cannot really tolerate any more than about
> >>>> 10-15ms of STW. So, what usually happens is that they live with the
> >>>> young collections, use CMS and tune out the CMFs as best they can (by
> >>>> clustering, rolling restart, etc, etc). I don't see any possibility of
> >>>> G1 becoming a viable solution for those systems any time soon.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Ben
> >>>
> >>
> >
>
>
>