G1GC Full GCs

Wed Jul 7 17:18:29 PDT 2010

On Wed, Jul 7, 2010 at 5:09 PM, Y. S. Ramakrishna <
y.s.ramakrishna at oracle.com> wrote:

>
> I plotted the heap used _after_ each gc (scavenge or full) etc.,
> attached; and if you stand away from the plot, tilt your head to the right
> and
> squint at the plot, you'll see what looks, at least to me, like a slow
> leak.
> (The tell-tale slowly-rising lower envelope of your carrier-wave,
> if you will pardon a telecom term.) Leaks can of course exacerbate
> fragmentation in non-moving collectors such as CMS, but also possibly
> in regionalized lazily evacuating collectors such as G1.
>

Hi Ramki,

Looking at the graph you attached, it appears that the low-water mark
stabilizes at somewhere between 4.5G and 5G. The configuration I'm running
is to allocate 40% of the heap to Memstore and 20% of the heap to the LRU
cache. For an 8G heap, this is 4.8GB. So, for this application it's somewhat
expected that, as it runs, it will accumulate more and more data until it
reaches this threshold. The data is, of course, not *permanent*, but it's
reasonably long-lived, so it makes sense to me that it should go into the
old generation.

If you like, I can tune down those percentages to 20/20 instead of 20/40,
and I think we'll see the same pattern, just stabilized around 3.2GB. This
will probably delay the full GCs, but still eventually hit them. It's also
way lower than we can really go - customers won't like "throwing away" 60%
of the allocated heap to GC!

>
> Perhaps jmap -histo:live or +PrintClassHistogram[AfterFullGC]
> will help you get to the bottom of yr leak. Once the leak is plugged
> perhaps we could come back to the G1 tuning effort? (We have some
> guesses as to what might be happening and the best G1 minds are
> chewing on the info you provided so far, for which thanks!)
>

I can try running with those options and see what I see, but I've already
spent some time looking at heap dumps, and not found any leaks, so I'm
pretty sure it's not the issue.

-Todd

>
> On 07/07/10 11:56, Todd Lipcon wrote:
>
>> On Wed, Jul 7, 2010 at 11:28 AM, Y. S. Ramakrishna <
>> y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>> wrote:
>>
>>
>>
>>    On 07/07/10 08:45, Todd Lipcon wrote:
>>    ...
>>
>>
>>        Overnight I saw one "concurrent mode failure".
>>
>>    ...
>>
>>        2010-07-07T07:56:27.786-0700: 28490.203: [GC 28490.203: [ParNew
>>        (promotion failed): 59008K->59008K(59008K), 0.0179250
>>        secs]28490.221: [CMS2010-07-07T07:56:27.901-0700: 28490.317:
>>        [CMS-concurrent-preclean: 0.556/0.947 secs] [Times:
>>         user=5.76 sys=0.26, real=0.95 secs]  (concurrent mode failure):
>>        6359176K->4206871K(8323072K), 17.4366220 secs]
>>        6417373K->4206871K(8382080K), [CMS Perm :
>>        18609K->18565K(31048K)], 17.4546890 secs] [Times: user=11.17
>>        sys=0.09, real=17.45 secs]
>>        I've interpreted pauses like this as being caused by
>>        fragmentation, since the young gen is 64M, and the old gen here
>>        has about 2G free. If there's something I'm not understanding
>>        about CMS, and I can tune it more smartly to avoid these longer
>>        pauses, I'm happy to try.
>>
>>
>>    Yes the old gen must be fragmented. I'll look at the data you have
>>    made available (for CMS). The CMS log you uploaded does not have the
>>    suffix leading into the concurrent mode failure ypu display above
>>    (it stops less than 2500 s into the run). If you could include
>>    the entire log leading into the concurrent mode failures, it would
>>    be a great help.
>>
>> Just uploaded the full log from the entire 11-hour run, all the way up
>> through the 218-second GC pause which caused the server to get kicked out of
>> the cluster (since it stopped heartbeating to the master)
>>
>> http://cloudera-todd.s3.amazonaws.com/cms-full-gc-log.txt.gz
>>
>>
>>    Do you have large arrays in your
>>    application?
>>
>> The primary heap consumers in the application are:
>> - RPC buffers - in this case I'm configured for 40 RPC handlers, each of
>> which is usually handling a byte[] around 2-3MB for a "put". These buffers
>> then get passed along into the memstore:
>> - Memstore - this is allocated 40% of the heap, and it's made up of some
>> hundreds of separate ConcurrentSkipListMaps. The values of the map are small
>> objects which contain offsets into to the byte[]s passed in above. So,
>> typically this is about 2GB of heap, corresponding to around a million of
>> the offset containers, and maybe 100 thousand of the actual byte arrays.
>>
>> These memstores are always being "flushed" to disk (basically we take one
>> of the maps and write it out, then drop references to the map to let GC free
>> up memory)
>>
>> - LRU block cache - this is a large ConcurrentHashMap<String,CachedBlock>,
>> where a CachedBlock is basically a wrapper for a ByteBuffer. These
>> ByteBuffers represent around 64KB each. Typically this is allocated 20% of
>> the heap, so on the order of 20,000 entries in the map here.
>>
>> Eviction is done by manually accounting heap usage, and when it gets too
>> high, we remove blocks from the cache.
>>
>> So to answer your question simply: there shouldn't be any byte arrays
>> floating around larger than 2MB, though there are a fair number at that size
>> and a fair number at 64KB. Can I use jmap or another program to do any
>> useful analysis?
>>
>>    The shape of the promotion graph for CMS is somewhat
>>    jagged, indicating _perhaps_ that. Yes, +PrintTenuringDistribution
>>    would shed a bit more light.
>>
>> I'll restart the test with this option on and collect some more logs for
>> you guys.
>>
>>    As regards fragmentation, it can be
>>    tricky to tune against, but we can try once we understand a bit
>>    more about the object sizes and demographics.
>>
>>    I am sure you don't have an easily shared test case, so we
>>    can reproduce both the CMS fragmentation and the G1 full gc
>>    issues locally for quickest progress on this?
>>
>> Well, the project itself is open source, but to really get serious load
>> going into it you need beefy machines/disks. I'm running my tests on a
>> 5-node cluster of dual quad core Nehalems, 24G RAM, 12 disks each. I can try
>> to set up a mocked workload (eg skip actual disk IO) from the same codebase,
>> but it would be a fair bit of work and I don't think I can get to it this
>> month (leaving for vacation next week)
>>
>> If it's useful to look at the source, here are some pointers to the
>> relevant RAM consumers:
>>
>> Cache:
>>
>> http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
>>
>> MemStore:
>>
>> http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
>>
>> Wrapper class held by memstore:
>>
>> http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
>>
>> The class used by RPC to receive "Put" requests:
>>
>> http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/Put.java
>>
>> Thanks again for all the help, it's much appreciated.
>> -Todd
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>

-- 
Todd Lipcon
Software Engineer, Cloudera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100707/5f462dca/attachment.html