G1GC Full GCs

Wed Jul 7 11:56:46 PDT 2010

On Wed, Jul 7, 2010 at 11:28 AM, Y. S. Ramakrishna <
y.s.ramakrishna at oracle.com> wrote:

>
>
> On 07/07/10 08:45, Todd Lipcon wrote:
> ...
>
>
>> Overnight I saw one "concurrent mode failure".
>>
> ...
>
>  2010-07-07T07:56:27.786-0700: 28490.203: [GC 28490.203: [ParNew (promotion
>> failed): 59008K->59008K(59008K), 0.0179250 secs]28490.221:
>> [CMS2010-07-07T07:56:27.901-0700: 28490.317: [CMS-concurrent-preclean:
>> 0.556/0.947 secs] [Times:
>>  user=5.76 sys=0.26, real=0.95 secs]  (concurrent mode failure):
>> 6359176K->4206871K(8323072K), 17.4366220 secs] 6417373K->4206871K(8382080K),
>> [CMS Perm : 18609K->18565K(31048K)], 17.4546890 secs] [Times: user=11.17
>> sys=0.09, real=17.45 secs]
>> I've interpreted pauses like this as being caused by fragmentation, since
>> the young gen is 64M, and the old gen here has about 2G free. If there's
>> something I'm not understanding about CMS, and I can tune it more smartly to
>> avoid these longer pauses, I'm happy to try.
>>
>
> Yes the old gen must be fragmented. I'll look at the data you have
> made available (for CMS). The CMS log you uploaded does not have the
> suffix leading into the concurrent mode failure ypu display above
> (it stops less than 2500 s into the run). If you could include
> the entire log leading into the concurrent mode failures, it would
> be a great help.

Just uploaded the full log from the entire 11-hour run, all the way up
through the 218-second GC pause which caused the server to get kicked out of
the cluster (since it stopped heartbeating to the master)

http://cloudera-todd.s3.amazonaws.com/cms-full-gc-log.txt.gz

Do you have large arrays in your
> application?

The primary heap consumers in the application are:
- RPC buffers - in this case I'm configured for 40 RPC handlers, each of
which is usually handling a byte[] around 2-3MB for a "put". These buffers
then get passed along into the memstore:
- Memstore - this is allocated 40% of the heap, and it's made up of some
hundreds of separate ConcurrentSkipListMaps. The values of the map are small
objects which contain offsets into to the byte[]s passed in above. So,
typically this is about 2GB of heap, corresponding to around a million of
the offset containers, and maybe 100 thousand of the actual byte arrays.

These memstores are always being "flushed" to disk (basically we take one of
the maps and write it out, then drop references to the map to let GC free up
memory)

- LRU block cache - this is a large ConcurrentHashMap<String,CachedBlock>,
where a CachedBlock is basically a wrapper for a ByteBuffer. These
ByteBuffers represent around 64KB each. Typically this is allocated 20% of
the heap, so on the order of 20,000 entries in the map here.

Eviction is done by manually accounting heap usage, and when it gets too
high, we remove blocks from the cache.

So to answer your question simply: there shouldn't be any byte arrays
floating around larger than 2MB, though there are a fair number at that size
and a fair number at 64KB. Can I use jmap or another program to do any
useful analysis?

> The shape of the promotion graph for CMS is somewhat
> jagged, indicating _perhaps_ that. Yes, +PrintTenuringDistribution
> would shed a bit more light.

I'll restart the test with this option on and collect some more logs for you
guys.

> As regards fragmentation, it can be
> tricky to tune against, but we can try once we understand a bit
> more about the object sizes and demographics.
>
> I am sure you don't have an easily shared test case, so we
> can reproduce both the CMS fragmentation and the G1 full gc
> issues locally for quickest progress on this?
>
> Well, the project itself is open source, but to really get serious load
going into it you need beefy machines/disks. I'm running my tests on a
5-node cluster of dual quad core Nehalems, 24G RAM, 12 disks each. I can try
to set up a mocked workload (eg skip actual disk IO) from the same codebase,
but it would be a fair bit of work and I don't think I can get to it this
month (leaving for vacation next week)

If it's useful to look at the source, here are some pointers to the relevant
RAM consumers:

Cache:
http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java

MemStore:
http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java

Wrapper class held by memstore:
http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java

The class used by RPC to receive "Put" requests:
http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/Put.java

Thanks again for all the help, it's much appreciated.
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100707/302a5d04/attachment.html