JEP 248: Make G1 the Default Garbage Collector

Tue Jun 2 12:21:25 UTC 2015

On Jun 1, 2015, at 9:16 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:

>> 
>> I suppose it is worth mentioning that the population of apps that don’t
>> stress GC is pretty small compared to those that do. ;-)
> 
> 
> Sadly, that's true :).

I’m not sure I agree with this. Unfortunately the negative effects of GC isn’t well recognized and therefor not reported. I rarely see an app where GC isn’t stressed to some degree.

Regards,
Kirk

> 
> On Mon, Jun 1, 2015 at 3:12 PM, charlie hunt <charlie.hunt at oracle.com>
> wrote:
> 
>> Yep, that’s right.
>> 
>> I suppose it is worth mentioning that the population of apps that don’t
>> stress GC is pretty small compared to those that do. ;-)
>> 
>> On Jun 1, 2015, at 2:01 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>> 
>> Also, G1 also has heavier write barriers than parallel gc, so some
>> existing workloads that don't stress the GC (e.g. code written purposely to
>> avoid GC during uptime by, say, object pooling) and wouldn't have tweaked
>> the default may experience some degradation.
>> 
>> On Mon, Jun 1, 2015 at 2:53 PM, charlie hunt <charlie.hunt at oracle.com>
>> wrote:
>> 
>>> Hi Jenny,
>>> 
>>> A couple questions and comments below.
>>> 
>>> thanks,
>>> 
>>> charlie
>>> 
>>>> On Jun 1, 2015, at 1:28 PM, Yu Zhang <yu.zhang at oracle.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I have done some performance comparison g1/cms/parallelgc internally at
>>> Oracle.  I would like to post my observations here to get some feedback, as
>>> I have limited benchmarks and hardware.  These are out of box performance.
>>>> 
>>>> Memory footprint/startup:
>>>> g1 has bigger memory footprint and longer start up time. The overhead
>>> comes from more gc threads, and internal data structures to keep track of
>>> remember set.
>>> 
>>> This is the memory footprint of the JVM itself when using the same size
>>> Java heap, right?
>>> 
>>> I don’t recall if it has been your observation?  One observation I have
>>> had with G1 is that it tends to be able to operate within tolerable
>>> throughput and latency with a smaller Java heap than with Parallel GC.  I
>>> have seen cases where G1 may not use the entire Java heap because it was
>>> able to keep enough free regions available yet still meet pause time goals.
>>> But, Parallel GC always use the entire Java heap, and once its occupancy
>>> reach capacity, it would GC. So they are cases where between the JVM’s
>>> footprint overhead, and taking into account the amount of Java heap
>>> required, G1 may actually require less memory.
>>> 
>>>> 
>>>> g1 vs parallelgc:
>>>> If the workload involves young gc only, g1 could be slightly slower.
>>> Also g1 can consume more cpu, which might slow down the benchmark if SUT is
>>> cpu saturated.
>>>> 
>>>> If there are promotions from young to old gen and leads to full gc with
>>> parallelgc, for smaller heap, parallel full gc can finish within some range
>>> of pause time, still out performs g1.  But for bigger heap, g1 mixed gc can
>>> clean the heap with pause times a fraction of parallel full gc time, so
>>> improve both throughput and response time.  Extreme cases are big data
>>> workloads(for example ycsb) with 100g heap.
>>> 
>>> I think what you are saying here is that it looks like if one can tune
>>> Parallel GC such that you can avoid a lengthy collection of old generation,
>>> or the live occupancy of old gen is small enough that the time to collect
>>> is small enough to be tolerated, then Parallel GC will offer a better
>>> experience.
>>> 
>>> However, if the live data in old generation at the time of its collection
>>> is large enough such that the time it takes to collect it exceeds a
>>> tolerable pause time, then G1 will offer a better experience.
>>> 
>>> Would also say that G1 offers a better experience in the presences of
>>> (wide) swings in object allocation rates since there would likely be a
>>> larger number of promotions during the allocation spikes?  In other words,
>>> G1 may offer more predictable pauses.
>>> 
>>>> 
>>>> g1 vs cms:
>>>> I will focus on response time type of workloads.
>>>> Ben mentioned
>>>> 
>>>> "Having said that, there is definitely a decent-sized class of systems
>>>> (not just in finance) that cannot really tolerate any more than about
>>>> 10-15ms of STW. So, what usually happens is that they live with the
>>>> young collections, use CMS and tune out the CMFs as best they can (by
>>>> clustering, rolling restart, etc, etc). I don't see any possibility of
>>>> G1 becoming a viable solution for those systems any time soon."
>>>> 
>>>> Can you give more details, like what is the live data set size, how big
>>> is the heap, etc?  I did some cache tests (Oracle coherence) to compare cms
>>> vs g1. g1 is better than cms when there are fragmentations. If you tune cms
>>> well to have little fragmentation, then g1 is behind cms.  But for those
>>> cases, they have to tune CMS very well, changing default to g1 won't impact
>>> them.
>>>> 
>>>> For big data kind of workloads (ycsb, spark in memory computing), g1 is
>>> much better than cms.
>>>> 
>>>> Thanks,
>>>> Jenny
>>>> 
>>>> On 6/1/2015 10:06 AM, Ben Evans wrote:
>>>>> Hi Vitaly,
>>>>> 
>>>>>>> Instead, G1 is now being talked of as a replacement for the default
>>>>>>> collector. If that's the case, then I think we need to acknowledge
>>> it,
>>>>>>> and have a conversation about where G1 is actually supposed to be
>>>>>>> used. Are we saying we want a "reasonably high throughput with
>>> reduced
>>>>>>> STW, but not low pause time" collector? If we are, that's fine, but
>>>>>>> that's not where we started.
>>>>>> That's a fair point, and one I'd be interesting in hearing an answer
>>> to as
>>>>>> well.  FWIW, the only GC I know of that's actually used in low latency
>>>>>> systems is Azul's C4, so I'm not even sure Oracle is trying to target
>>> the
>>>>>> same use cases.  So when we talk about "low latency" GCs, we should
>>> probably
>>>>>> also be clear on what "low" actually means.
>>>>> Well, when I started playing with them, "low latency" meant a
>>>>> sub-10-ms transaction time with 100ms STW as acceptable, if not ideal.
>>>>> 
>>>>> These days, the same sort of system needs a sub 500us transaction
>>>>> time, and ideally no GC pause at all. But that leads to Zing, or
>>>>> non-JVM solutions, and I think takes us too far into a specialised use
>>>>> case.
>>>>> 
>>>>> Having said that, there is definitely a decent-sized class of systems
>>>>> (not just in finance) that cannot really tolerate any more than about
>>>>> 10-15ms of STW. So, what usually happens is that they live with the
>>>>> young collections, use CMS and tune out the CMFs as best they can (by
>>>>> clustering, rolling restart, etc, etc). I don't see any possibility of
>>>>> G1 becoming a viable solution for those systems any time soon.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Ben
>>>> 
>>> 
>>> 
>> 
>>