Bug in G1GC it performs Full GC when code cache is full resulting in overkill

Thu May 30 15:32:55 PDT 2013

Hi,

Monica, thanks a lot for Your additional insight about H-Objects, as i've
already mentioned in this thread, your great JavaOne presentation about
G1GC was key in solving our problem.

You are right that it is all about "fitting" H-Object in old gen. In our
case no matter how high (within confines of what server memory allowed us)
we set the heap size, we were still getting H-Obj alloc failures. Actually
even drastically cutting H-Object count we were still getting some Full
GCs, only after we increased region size step by step to 16M they ceased.
It was subtle fragmentation issue, made worse by rather small default
region size.

The following happened:

1) Big web request came in, 1-1.5GB of various allocations were done to
generate that Json string, triggering young gc in progress.
2) After young gc heap was a bit fragmented cause there was temporary, but
still "live" data. All that data now got into fresh survivor/old regions.
The "health" of heap now really depends on how long ago last mixed GC ran.
3) Into such "sprayed" heap we start to allocate our big object. I have no
idea how a set of regions is chosen for humongous region, but i think we
were generating a total of ~30 humongous objects ( "generating" as in
resizing StringBuffer somewhere deep in web framework till 30M fits) and
that was too much for G1GC to cope.
4) Reducing allocation rate is not enough unfortunately, those small ones
that slip are really dangerous - they are immediately allocated in OldGen,
fragmenting the heap further.
5) It is now a race between big web requests and next mixed gc. We could
reliably reproduce Full GCs in testing by generating several well timed big
requests :)

Getting region size up from default (actually I got a question, why G1GC is
aiming for thousands of regions, what are the drawbacks of larger than
"default" region) is more about reducing fragmentation by keeping those
large but temporary objects where they belong - in nursery where G1GC can
collect them efficiently.

So we have 3 important todo and tunables in avoiding H-Object caused FullGC:

1) Code changes - any profiler that can record every allocation stack trace
will help, set it to record each alloc above 1/2 heap region size and limit
them as much as possible
2) IHOP should be tuned to allow mixed GCs frequently enough even if your
app is behaving perfectly ( stable old gen + temporaries in nursery with
little promotion going on ).
3) G1 region size can be increased to reduce heap fragmentation caused by
H-Objects if they are temporary (by reducing them to ordinary objects
allocated in young gen)

Darius.

On Thu, May 30, 2013 at 11:55 PM, Monica Beckwith <
monica.beckwith at oracle.com> wrote:

>  +1 to enabling PrintAdaptiveSizePolicy.
> Darius,
> As you have already mentioned - Any object with size greater or equal to a
> half region is called a "humongous" object (H-obj). The max region size for
> G1 is 32M. So yes, even if you set your region size to max, your 32M object
> will be considered humongous.
> Now, there are a couple of things that we should be aware of with respect
> to humongous regions (H-region)/objects -
>
>    1. The H-obj allocation will happen directly into the old generation
>       1. There will be a check for marking threshold (IHOP), and a
>       concurrent cycle will be initiated if necessary
>    2. The H-regions are not included in an evacuation pause, since it's
>    just going to increase the copying expense.
>       1. But if the H-obj(s) are dead, they get freed at the end of the
>       multi-phased concurrent marking cycle.
>
> So, I think, if you have to work with H-objs and increasing the region
> size doesn't help (as is your case), then maybe you should try limiting
> your nursery so as to allow more space for the old generation, so as to
> sustain your H-objs till they die or if they are a part of your live data
> set, then it's all the more necessary to be able to fit them in your old
> gen.
>
> -Monica
>
>
> On 5/30/2013 1:54 PM, Martin Makundi wrote:
>
> Hi!
>
>
>  I'd strongly suggest that Martin should add
> -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was
> what we needed to solve the mystery of FullGCs with gigabytes of heap
> free.
>
>  Thanks, will add that tomorrow.
>
>
>  Actually with some minor googling around i've found:
> https://forums.oracle.com/forums/thread.jspa?messageID=10869877
>
> I suspect it could be same story as ours, "humongous allocation
> request failed" is bad for JVM health, FullGC will occur immediately.
>
> Remember, any allocation that is larger than half of G1GC region size
> will get allocated as "humongous" object on heap, that does not care
> about regions etc. In our case we were failing to allocate 32
> megabytes with over 50% of heap free!
>
>  Any solution to such problem or is it a bug in g1gc? Is there a way to
> log what code is performing the memory allocation if that happens to
> be the case?
>
> **
> Martin
>
>  Best regards,
>
> Darius.
>
>
>
> On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson<john.cuthbertson at oracle.com> <john.cuthbertson at oracle.com> wrote:
>
>  Hi Martin,
>
> I'm going to fill in bit more detail to Charlie's replies....
>
> On 5/29/2013 8:49 AM, Martin Makundi wrote:
>
>  Hi!
>
>
>  A bit of constructive criticism ;-)  It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text.  In short, always measure and reason about whether what you've observed for an improvement or regression makes sense.  And, also run multiple times to get a sense of noise versus real improvement or regression.
>
>  Thanks. That's one of the reasons we never changed our options. Once
> we found someting that works very well, we know that its always n!
> work to test changes and the system was running very nice indeed
> before the previous tweak ;)
>
>
>   -XX:+UseFastAccessorMethod  (the default is disabled)
>
>  Fast sounds good, the description of it is "Use optimized versions of
> Get<Primitive>Field" which sounds good. I see no harm in this.
>
>  These would be JNI operations.
>
> A quick at the HotSpot source suggests UseFastAccessorMethods
> is mostly confined to interpreter operations.
>
>  Thanks for the info. Doesn't say much to me, but does not seem to harm
> anything. Will try setting it off at some point in time.
>
>
>   -XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?
> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)
>
>  Key point is that you should use -XX:+UseNUMA only when you are
> deploying  a JVM that spans NUMA nodes.
>
>  Thanks for the info. Doesn't say much to me, but does not seem to harm
> anything. Will try setting it off at some point in time.
>
>  The fast accessor methods flag creates specialized (i.e. short and
> optimized) interpreter entry points for accessor methods (those that
> just return the value in one of the object's fields). In most
> applications the bulk of the execution time is spent executing JIT
> compiled code; only a few percent is typically spent in Hotspot's
> interpreter. The JIT compiler will always try to inline accessor methods
> into their caller. So, unless your application is spending a ton of time
> interpreting, this flag should make no difference.
>
>
>   -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC)
>
>  Again, not documented thoroughly where it applies and where not, jvm
> gave no warning/error about it so we assumed it's valid.
>
>  There's always the HotSpot source code ;-)
>
> It's also quite well documented in various slide ware on the internet.
> It's also quite well documented in the Java Performance book. :-)
>
>  Uh.. does it say somewhere that Do not use
> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance
> tuning is your bread and butter but is not ours... is more like we are
> just driving the car and you are the mechanic...different
> perspective.. just trying to fill'er'up to go..leaded or unleaded...
> ;)
>
>  The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP
> for short). Actually both G1 and CMS accept and observe IHOP.
> CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still
> accepts the old flag name - but it is CMS only and doesn't affect G1.
>
>
>   -XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's concurrent cycle to run continuously?)
>
>  Yes, that's what we figured out, we don't want it to sit lazy and end
> up in a situation where it is required to do a Full GC. This switch
> was specifically chosen in a situation we had a memory leak and tried
> to aggressively fight against it before we found the root cause. Maybe
> we should try without this switch now, and see what effect it has.
>
>  Having GC logs to see what available head room you have between the
> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate.
>
>  Hmm.. I don't thoroughly understand the logs either, but, here is a snap:
>
> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs]
>     [Parallel Time: 288.8 ms]
>        [GC Worker Start (ms):  38905407.6  38905407.7  38905407.7  38905407.9
>         Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff:   0.3]
>        [Ext Root Scanning (ms):  22.8  16.3  18.1  22.0
>         Avg:  19.8, Min:  16.3, Max:  22.8, Diff:   6.6]
>        [SATB Filtering (ms):  0.0  0.1  0.0  0.0
>         Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>        [Update RS (ms):  31.9  37.3  35.1  33.3
>         Avg:  34.4, Min:  31.9, Max:  37.3, Diff:   5.5]
>           [Processed Buffers : 102 106 119 104
>            Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17]
>        [Scan RS (ms):  0.0  0.0  0.1  0.0
>         Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>        [Object Copy (ms):  228.2  229.1  229.5  227.3
>         Avg: 228.5, Min: 227.3, Max: 229.5, Diff:   2.2]
>        [Termination (ms):  0.0  0.0  0.0  0.0
>         Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
>           [Termination Attempts : 4 1 11 4
>            Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10]
>        [GC Worker End (ms):  38905690.5  38905690.5  38905690.5  38905690.5
>         Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff:   0.0]
>        [GC Worker (ms):  282.9  282.8  282.8  282.6
>         Avg: 282.8, Min: 282.6, Max: 282.9, Diff:   0.3]
>        [GC Worker Other (ms):  5.9  6.0  6.0  6.2
>         Avg:   6.0, Min:   5.9, Max:   6.2, Diff:   0.3]
>     [Complete CSet Marking:   0.0 ms]
>     [Clear CT:   0.1 ms]
>     [Other:   3.7 ms]
>        [Choose CSet:   0.0 ms]
>        [Ref Proc:   2.8 ms]
>        [Ref Enq:   0.1 ms]
>        [Free CSet:   0.3 ms]
>     [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap:
> 15790M(26624M)->15741M(26624M)]
>   [Times: user=1.14 sys=0.00, real=0.29 secs]
> Heap after GC invocations=575 (full 157):
>   garbage-first heap   total 27262976K, used 16119181K
> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>    region size 8192K, 36 young (294912K), 36 survivors (294912K)
>   compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
> 0x0000000800000000, 0x0000000800000000)
>     the space 524288K,  31% used [0x00000007e0000000,
> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
> No shared spaces configured.
> }
> {Heap before GC invocations=575 (full 157):
>   garbage-first heap   total 27262976K, used 16119181K
> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>    region size 8192K, 37 young (303104K), 36 survivors (294912K)
>   compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
> 0x0000000800000000, 0x0000000800000000)
>     the space 524288K,  31% used [0x00000007e0000000,
> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
> No shared spaces configured.
> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC
> 15742M->14497M(26624M), 56.7731320 secs]
>
> That's the third Full GC today after the change to 26G and change from
> occupancypercent=0. Tomorrow will be trying again with
> occupancypercent=0
>
>  What did you set the IHOP value to?
>
>
>    -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time)
>
>  Jvm 1.6 stopped the world for couple of minutes several times per day
> while unloading classes, so we used noclassgc to disable that. We do
> not know if this is necessary for latest 1.7 to avoid class unload
> pause, but we continued to use this switch and found no harm in it.
> Can't afford testing that in production ;)
>
>  Haven't seen a case where unloading classes cause a several minute pause.
> Are you sure your system is not swapping?  And, do you have GC logs you
> can share that illustrate the behavior and that -noclassgc fixed it?
>
>  We deleted swap partition long time ago, we simply do not risk swapping at all.
>
> We had this class unloading problem several times per day like half a
> year ago, and fixed it with noclasssgc, that was a no-brainer, single
> parameter that made the difference.
>
> It is also discussed here (they do not discuss noclassgc though, we
> figured that out somehow)http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages
>
>  G1 only performs class unloading during a full GC. But if you're not
> running out of perm space or compiled code cache - you can leave this flag.
>
>
>    -XX:+ UseGCOverheadLimit
> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.  You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space.
>
>  For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.
>
>  If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution.  I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g.
>
>  It is also documented that 48m is maximumhttp://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86:
> 48m"
>
>
>
>  That's the default max code cache size. When the JIT compiler compiles a
> Java method it places the generated code into the code cache. When
> there's no more room in the code cache, a warning is issued and JIT
> compilation is stopped. You can set it higher. IIRC there was time in
> the past when the size was limited in order to use short branches in
> compiled code. I don't think we've had that restriction for a while.
>
> HTHs
>
> JohnC
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>  _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>  _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
> --
> [image: Oracle] <http://www.oracle.com>
> Monica Beckwith | Principal Member of Technical Staff
> VOIP: +15124011274
> Oracle Java Performance
>
> [image: Green Oracle] <http://www.oracle.com/commitment> Oracle is
> committed to developing practices and products that help protect the
> environment
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130531/01047367/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 356 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130531/01047367/attachment-0002.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 658 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130531/01047367/attachment-0003.gif