JEP 248: Make G1 the Default Garbage Collector

Tue Jun 2 01:00:38 UTC 2015

Hi Jeremy,

Are you suggesting making Google’s CMS the new default instead?
The target for this is long running server applications where fragmentation issues become increasingly awkward over time. Literature suggests fragmentation overheads can be as bad as allocations costing 1/2 log(n) as much memory due to fragmentation, where n is the ratio of the smallest and largest allocatable objects. In short… ouch! This can make the JVM run out of memory and crash, which is suboptimal.
So I’m curious - what’s the Google solution to fragmentation using CMS? Let me guess… buy more memory? :p

Cheers,
/Erik

Från: Jeremy Manson <jeremymanson at google.com<mailto:jeremymanson at google.com>>
Datum: Tuesday 2 June 2015 01:04
Till: charlie hunt <charlie.hunt at oracle.com<mailto:charlie.hunt at oracle.com>>
Kopia: Erik Österlund <erik.osterlund at lnu.se<mailto:erik.osterlund at lnu.se>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net> Source Developers" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>
Ämne: Re: JEP 248: Make G1 the Default Garbage Collector

This is a very interesting conversation, and I'd like to throw in Google's not-quite-relevant observations, too.   We're not quite relevant because we'll just change the default to whatever we want, and because we ignore the parallel GC, for the most part.  Much of it has been said before in this thread, but it's worth reiterating.

- As it happens, we're considering changing the default, too - to the CMS collector!

- We've been asking people to try G1 approximately annually for the last five or so years.

- Although G1 performance has gotten better over time, we still find that the additional CPU costs outweigh any benefits, and that CMS typically ends up ahead.

- With most of our more important workloads, our admins tend to tune applications carefully so that the OG doesn't fill up too much, and CMS cycles are not triggered often.  This is deeply important in server-style applications - you want anything generated and used by a single request to go away with a YG collection.

- Part of the reason our CMS performs better is that we've made changes to CMS to improve its performance.  We see roughly half as much CPU utilization, and the long tail pause times have been cut dramatically.  It would be nice if we could upstream the changes (especially for us, because they break with every new release, and we have to fix them!), but we've never been able to find anyone at Oracle who has the bandwidth to do the reviews.  For example, the response to this:

http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2015-May/013426.html

was not hugely enthusiastic, and that's not even one of the more complex changes.

Jeremy

On Mon, Jun 1, 2015 at 1:51 PM, charlie hunt <charlie.hunt at oracle.com<mailto:charlie.hunt at oracle.com>> wrote:
Hi Erik,

HotSpot does some of this ergonomics today for both GC and JIT compiler in cases where the JVM sees less than 2 GB of RAM and the OS it is running on. These decisions are based on what is called a “server class machine”.  A “server class machine” as of JDK 6u18 is defined as a system that has 2 GB or more of RAM, two or more hardware threads. There are other cases for a given hardware platform, and if it is a 32-bit JVM, the collector (and JIT compiler) ergonomically selected may also differ from other configurations.

AFAIK, the JEP is proposing to change the default GC in configurations where the default GC is Parallel GC to using G1 as the default.

The challenge with what you are describing is that the best GC cannot always be ergonomically selected by the JVM without some input from the user, i.e. GC doesn’t know if any GC pauses greater than 200 ms are acceptable regardless of Java heap size, number of hardware threads, etc.

thanks,

charlie

> On Jun 1, 2015, at 2:53 PM, Erik Österlund <erik.osterlund at lnu.se<mailto:erik.osterlund at lnu.se>> wrote:
>
> Hi all,
>
> Does there have to be a single default one-size-fits-all GC algorithm for
> users to rely on? Or could we allow multiple algorithms and explicitly
> document that unless a GC is picked, the runtime is free to pick whatever
> it believes is better? This could have multiple benefits.
>
> 1. This could make such a similar change easier in the future as everyone
> will already be aware that if they really rely on the properties of a
> specific GC algorithm, then they should choose that GC explicitly and not
> rely on defaults not changing; there are no guarantees that defaults will
> not change.
>
> 2. Obviously there has been a long discussion in this thread which GC is
> better in which context, and it seems like right now one size does not fit
> all. The user that relied on the defaults might not be so aware of these
> specifics. Therefore we might do them a big favour of attempting to make a
> guess for them to work out-of-the-box, which is pretty neat.
>
> 3. This approach allows deploying G1 not everywhere, but where we guess it
> performs pretty well. This means it will run in fewer JVM contexts and
> hence pose less risk than deploying it to be used for all contexts, making
> the transition smoother.
>
> One idea could be to first determine valid GC variants given the supplied
> flags (GC-specific flags imply use of that GC), and then among the valid
> GCs left, ³guess² which algorithm is better based on the other general
> parameters, such as e.g. heap size (and maybe target latency)? Could for
> instance pick ParallelGC for small heaps, G1 for larger heaps and CMS for
> ridiculously large heaps or cases when extremely low latency is wanted?
>
> My reasoning is based on two assumptions: 1) changing the defaults would
> target the users that don¹t know what¹s best for them, 2) one size does
> not fit all. If these assumption are wrong, then this is a bad idea.
>
> Thanks,
> /Erik
>
>
>
> Den 01/06/15 20:53 skrev charlie hunt <charlie.hunt at oracle.com<mailto:charlie.hunt at oracle.com>>:
>
>> Hi Jenny,
>>
>> A couple questions and comments below.
>>
>> thanks,
>>
>> charlie
>>
>>> On Jun 1, 2015, at 1:28 PM, Yu Zhang <yu.zhang at oracle.com<mailto:yu.zhang at oracle.com>> wrote:
>>>
>>> Hi,
>>>
>>> I have done some performance comparison g1/cms/parallelgc internally at
>>> Oracle.  I would like to post my observations here to get some feedback,
>>> as I have limited benchmarks and hardware.  These are out of box
>>> performance.
>>>
>>> Memory footprint/startup:
>>> g1 has bigger memory footprint and longer start up time. The overhead
>>> comes from more gc threads, and internal data structures to keep track
>>> of remember set.
>>
>> This is the memory footprint of the JVM itself when using the same size
>> Java heap, right?
>>
>> I don¹t recall if it has been your observation?  One observation I have
>> had with G1 is that it tends to be able to operate within tolerable
>> throughput and latency with a smaller Java heap than with Parallel GC.  I
>> have seen cases where G1 may not use the entire Java heap because it was
>> able to keep enough free regions available yet still meet pause time
>> goals. But, Parallel GC always use the entire Java heap, and once its
>> occupancy reach capacity, it would GC. So they are cases where between
>> the JVM¹s footprint overhead, and taking into account the amount of Java
>> heap required, G1 may actually require less memory.
>>
>>>
>>> g1 vs parallelgc:
>>> If the workload involves young gc only, g1 could be slightly slower.
>>> Also g1 can consume more cpu, which might slow down the benchmark if SUT
>>> is cpu saturated.
>>>
>>> If there are promotions from young to old gen and leads to full gc with
>>> parallelgc, for smaller heap, parallel full gc can finish within some
>>> range of pause time, still out performs g1.  But for bigger heap, g1
>>> mixed gc can clean the heap with pause times a fraction of parallel full
>>> gc time, so improve both throughput and response time.  Extreme cases
>>> are big data workloads(for example ycsb) with 100g heap.
>>
>> I think what you are saying here is that it looks like if one can tune
>> Parallel GC such that you can avoid a lengthy collection of old
>> generation, or the live occupancy of old gen is small enough that the
>> time to collect is small enough to be tolerated, then Parallel GC will
>> offer a better experience.
>>
>> However, if the live data in old generation at the time of its collection
>> is large enough such that the time it takes to collect it exceeds a
>> tolerable pause time, then G1 will offer a better experience.
>>
>> Would also say that G1 offers a better experience in the presences of
>> (wide) swings in object allocation rates since there would likely be a
>> larger number of promotions during the allocation spikes?  In other
>> words, G1 may offer more predictable pauses.
>>
>>>
>>> g1 vs cms:
>>> I will focus on response time type of workloads.
>>> Ben mentioned
>>>
>>> "Having said that, there is definitely a decent-sized class of systems
>>> (not just in finance) that cannot really tolerate any more than about
>>> 10-15ms of STW. So, what usually happens is that they live with the
>>> young collections, use CMS and tune out the CMFs as best they can (by
>>> clustering, rolling restart, etc, etc). I don't see any possibility of
>>> G1 becoming a viable solution for those systems any time soon."
>>>
>>> Can you give more details, like what is the live data set size, how big
>>> is the heap, etc?  I did some cache tests (Oracle coherence) to compare
>>> cms vs g1. g1 is better than cms when there are fragmentations. If you
>>> tune cms well to have little fragmentation, then g1 is behind cms.  But
>>> for those cases, they have to tune CMS very well, changing default to g1
>>> won't impact them.
>>>
>>> For big data kind of workloads (ycsb, spark in memory computing), g1 is
>>> much better than cms.
>>>
>>> Thanks,
>>> Jenny
>>>
>>> On 6/1/2015 10:06 AM, Ben Evans wrote:
>>>> Hi Vitaly,
>>>>
>>>>>> Instead, G1 is now being talked of as a replacement for the default
>>>>>> collector. If that's the case, then I think we need to acknowledge
>>>>>> it,
>>>>>> and have a conversation about where G1 is actually supposed to be
>>>>>> used. Are we saying we want a "reasonably high throughput with
>>>>>> reduced
>>>>>> STW, but not low pause time" collector? If we are, that's fine, but
>>>>>> that's not where we started.
>>>>> That's a fair point, and one I'd be interesting in hearing an answer
>>>>> to as
>>>>> well.  FWIW, the only GC I know of that's actually used in low latency
>>>>> systems is Azul's C4, so I'm not even sure Oracle is trying to target
>>>>> the
>>>>> same use cases.  So when we talk about "low latency" GCs, we should
>>>>> probably
>>>>> also be clear on what "low" actually means.
>>>> Well, when I started playing with them, "low latency" meant a
>>>> sub-10-ms transaction time with 100ms STW as acceptable, if not ideal.
>>>>
>>>> These days, the same sort of system needs a sub 500us transaction
>>>> time, and ideally no GC pause at all. But that leads to Zing, or
>>>> non-JVM solutions, and I think takes us too far into a specialised use
>>>> case.
>>>>
>>>> Having said that, there is definitely a decent-sized class of systems
>>>> (not just in finance) that cannot really tolerate any more than about
>>>> 10-15ms of STW. So, what usually happens is that they live with the
>>>> young collections, use CMS and tune out the CMFs as best they can (by
>>>> clustering, rolling restart, etc, etc). I don't see any possibility of
>>>> G1 becoming a viable solution for those systems any time soon.
>>>>
>>>> Thanks,
>>>>
>>>> Ben
>>>
>>
>