From yu.zhang at oracle.com  Thu Jan  2 08:24:59 2014
From: yu.zhang at oracle.com (YU ZHANG)
Date: Thu, 02 Jan 2014 08:24:59 -0800
Subject: G1 GC clean up time is too long
In-Reply-To: <CA+FETE+00T4XxryusyNseq6GgQp2=_=Pdo16=mEzZsqQHGXTpQ@mail.gmail.com>
References: <CA+FETEJmSHU-fR74p9LXDnS4aH5hjgP8JFuH21uYNu=yGVV1zA@mail.gmail.com>
	<52B5037C.8010704@servergy.com>
	<CA+FETE+00T4XxryusyNseq6GgQp2=_=Pdo16=mEzZsqQHGXTpQ@mail.gmail.com>
Message-ID: <52C592DB.9060007@oracle.com>

Yao,

Thanks for the feedback.  Glad to know that the tuning helps.

We are working on improving G1 performance.  If the current build does 
not meet your requirements, I hope the future builds(jdk8), with more 
improvements, work for your workload.

Thanks,
Jenny

On 12/31/2013 5:27 PM, yao wrote:
> Hi Folks,
>
> Sorry for reporting GC performance result late, we are in the code 
> freeze period for the holiday season and cannot do any production 
> related deployment.
>
> First, I'd like to say thank you to Jenny, Monica and Thomas. Your 
> suggestions are really helpful and help us to understand G1 GC 
> behavior. We did NOT observe any full GCs after adjusting suggested 
> parameters. That is really awesome, we tried these new parameters on 
> Dec 26 and full GC disappeared since then (at least until I am writing 
> this email, at 3:37pm EST, Dec 30).
>
> G1 parameters:
> *-XX:MaxGCPauseMillis=100
> *-XX:G1HeapRegionSize=32m
> *-XX:InitiatingHeapOccupancyPercent=65
> *-XX:G1ReservePercent=20
> *-XX:G1HeapWastePercent=5
> -XX:G1MixedGCLiveThresholdPercent=75
>
> *
> We've reduced**MaxGCPauseMillis to 100 since our real-time system is 
> focus on low pause, if system cannot give response in 50 milliseconds, 
> it's totally useless for the client. However, current read latency 99 
> percentile is still slightly higher than CMS machines but they are 
> pretty close (14 millis vs 12 millis). One thing we can do now is to 
> increase heap size for G1 machines, for now, the heap size for G1 is 
> only 90 percent of those CMS machines. This is because we observed our 
> server process is killed by OOM killer on G1 machines and we decided 
> to decrease heap size on G1 machines. Since G1ReservePercent was 
> increased, we think it should be safe to increase G1 heap to be same 
> as CMS machine. We believe it could make G1 machine give us better 
> performance because 40 percent of heap will be used for block cache.
>
> Thanks
> -Shengzhe
>
> G1 Logs
>
> 2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young)
> Desired survivor size 234881024 bytes, new threshold 14 (max 15)
> - age   1:   16447904 bytes,   16447904 total
> - age   2:   30614384 bytes,   47062288 total
> - age   3:   16122104 bytes,   63184392 total
> - age   4:   16542280 bytes,   79726672 total
> - age   5:   14249520 bytes,   93976192 total
> - age   6:   15187728 bytes,  109163920 total
> - age   7:   15073808 bytes,  124237728 total
> - age   8:   17903552 bytes,  142141280 total
> - age   9:   17031280 bytes,  159172560 total
> - age  10:   16854792 bytes,  176027352 total
> - age  11:   19192480 bytes,  195219832 total
> - age  12:   20491176 bytes,  215711008 total
> - age  13:   16367528 bytes,  232078536 total
> - age  14:   15536120 bytes,  247614656 total
>  308692.158: [G1Ergonomics (CSet Construction) start choosing CSet, 
> _pending_cards: 32768, predicted base time: 38.52 ms, remaining time: 
> 61.48 ms, target pause time: 100.00 ms]
>  308692.158: [G1Ergonomics (CSet Construction) add young regions to 
> CSet, eden: 91 regions, survivors: 14 regions, predicted young region 
> time: 27.76 ms]
>  308692.158: [G1Ergonomics (CSet Construction) finish choosing CSet, 
> eden: 91 regions, survivors: 14 regions, old: 0 regions, predicted 
> pause time: 66.28 ms, target pause time: 100.00 ms]
>  308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent 
> cycle initiation, reason: occupancy higher than threshold, occupancy: 
> 52143587328 bytes, allocation request: 0 bytes, threshold: 46172576125 
> bytes (65.00 %), source: end of GC]
> , 0.0749020 secs]
>    [Parallel Time: 53.9 ms, GC Workers: 18]
>       [GC Worker Start (ms): Min: 308692158.6 <tel:308692158.6>, Avg: 
> 308692159.0 <tel:308692159.0>, Max: 308692159.4 <tel:308692159.4>, 
> Diff: 0.8]
>       [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: 
> 2.4, Sum: 81.9]
>       [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, 
> Sum: 209.0]
>          [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: 16, 
> Sum: 405]
>       [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: 144.3]
>       [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: 0.2, 
> Sum: 510.7]
>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 
> 1.2]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, 
> Sum: 0.5]
>       [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, Diff: 
> 0.8, Sum: 947.5]
>       [GC Worker End (ms): Min: 308692211.6 <tel:308692211.6>, Avg: 
> 308692211.7 <tel:308692211.7>, Max: 308692211.7 <tel:308692211.7>, 
> Diff: 0.1]
>    [Code Root Fixup: 0.0 ms]
>    [Clear CT: 9.8 ms]
>    [Other: 11.1 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 2.4 ms]
>       [Ref Enq: 0.4 ms]
>       [Free CSet: 1.1 ms]
>    [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: 448.0M->416.0M 
> Heap: 51.7G(66.2G)->48.9G(66.2G)]
>  [Times: user=1.07 sys=0.01, real=0.08 secs]
>  308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent 
> cycle, reason: concurrent cycle initiation requested]
> 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young) (initial-mark)
> Desired survivor size 268435456 bytes, new threshold 15 (max 15)
> - age   1:   17798336 bytes,   17798336 total
> - age   2:   15275456 bytes,   33073792 total
> - age   3:   27940176 bytes,   61013968 total
> - age   4:   15716648 bytes,   76730616 total
> - age   5:   16474656 bytes,   93205272 total
> - age   6:   14249232 bytes,  107454504 total
> - age   7:   15187536 bytes,  122642040 total
> - age   8:   15073808 bytes,  137715848 total
> - age   9:   17362752 bytes,  155078600 total
> - age  10:   17031280 bytes,  172109880 total
> - age  11:   16854792 bytes,  188964672 total
> - age  12:   19124800 bytes,  208089472 total
> - age  13:   20491176 bytes,  228580648 total
> - age  14:   16367528 bytes,  244948176 total
>  308697.313: [G1Ergonomics (CSet Construction) start choosing CSet, 
> _pending_cards: 31028, predicted base time: 37.87 ms, remaining time: 
> 62.13 ms, target pause time: 100.00 ms]
>  308697.313: [G1Ergonomics (CSet Construction) add young regions to 
> CSet, eden: 113 regions, survivors: 13 regions, predicted young region 
> time: 27.99 ms]
>  308697.313: [G1Ergonomics (CSet Construction) finish choosing CSet, 
> eden: 113 regions, survivors: 13 regions, old: 0 regions, predicted 
> pause time: 65.86 ms, target pause time: 100.00 ms]
> , 0.0724890 secs]
>    [Parallel Time: 51.9 ms, GC Workers: 18]
>       [GC Worker Start (ms): Min: 308697313.3 <tel:308697313.3>, Avg: 
> 308697313.7 <tel:308697313.7>, Max: 308697314.0 <tel:308697314.0>, 
> Diff: 0.6]
>       [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, Diff: 
> 12.3, Sum: 101.8]
>       [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, Sum: 
> 166.9]
>          [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, 
> Sum: 396]
>       [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, Sum: 152.3]
>       [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: 5.2, 
> Sum: 487.0]
>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 
> 1.0]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, 
> Sum: 0.6]
>       [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, Diff: 
> 0.6, Sum: 909.5]
>       [GC Worker End (ms): Min: 308697364.2 <tel:308697364.2>, Avg: 
> 308697364.2 <tel:308697364.2>, Max: 308697364.3 <tel:308697364.3>, 
> Diff: 0.1]
>    [Code Root Fixup: 0.0 ms]
>    [Clear CT: 9.9 ms]
>    [Other: 10.8 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 2.8 ms]
>       [Ref Enq: 0.4 ms]
>       [Free CSet: 0.9 ms]
>    [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: 416.0M->448.0M 
> Heap: 52.5G(66.2G)->49.0G(66.2G)]
>  [Times: user=1.01 sys=0.00, real=0.07 secs]
> 2013-12-30T08:25:31.954-0500: 308697.385: [GC 
> concurrent-root-region-scan-start]
> 2013-12-30T08:25:31.967-0500: 308697.398: [GC 
> concurrent-root-region-scan-end, 0.0131710 secs]
> 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start]
> 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end, 
> 4.5984140 secs]
> 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark 
> 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 
> secs], 0.0659540 secs]
>  [Times: user=0.87 sys=0.00, real=0.06 secs]
> 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup 52G->52G(66G), 
> 0.5487830 secs]
>  [Times: user=9.66 sys=0.06, real=0.54 secs]
> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-start]
> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-end, 
> 0.0000480 secs]
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/ab765b84/attachment.html 

From ryebrye at gmail.com  Thu Jan  2 09:57:26 2014
From: ryebrye at gmail.com (Ryan Gardner)
Date: Thu, 2 Jan 2014 12:57:26 -0500
Subject: G1 GC clean up time is too long
In-Reply-To: <CA+FETE+00T4XxryusyNseq6GgQp2=_=Pdo16=mEzZsqQHGXTpQ@mail.gmail.com>
References: <CA+FETEJmSHU-fR74p9LXDnS4aH5hjgP8JFuH21uYNu=yGVV1zA@mail.gmail.com>
	<52B5037C.8010704@servergy.com>
	<CA+FETE+00T4XxryusyNseq6GgQp2=_=Pdo16=mEzZsqQHGXTpQ@mail.gmail.com>
Message-ID: <CAEAKNo9v-zisw=Ds4cCMDz9VJHoEszSYmUFRXNcw1JGRccO33Q@mail.gmail.com>

I've also fought with cleanup times being long with a large heap and G1. In
my case, I was suspicious that the RSet coarsening was increasing the time
for GC Cleanups.

If you have a way to test different settings in a non-production
environment, you could consider experimenting with:


-XX:+UnlockExperimentalVMOptions

-XX:G1RSetRegionEntries=4096

and different values for the RSetRegionEntries - 4096 was a sweet spot for
me, but your application may behave differently.

You can turn on:

-XX:+UnlockDiagnosticVMOptions

-XX:+G1SummarizeRSetStats

-XX:G1SummarizeRSetStatsPeriod=20

to get it to spit out what it is doing to get some more insight into those
times.


The specific number of RSetRegionEntries I set (4096) was, in theory,
supposed to be close to what it was setting based on my region size (also
32m) and number of regions- but it did not seem to be.

Also, if you have more memory available, I have found G1 to take the extra
memory and not increase pause times much. As you increase the total heap
size, the size of your smallest possible collection will also increase
since it sets it to a percentage of total heap... In my case I was tuning
an applicaiton that was a cache, so it had tons heap space but wasn't
churning it over much...

I ended up going as low as:

-XX:G1NewSizePercent=1

to let G1 feel free to use as few regions as possible to achieve smaller
pause times.

I've been running in production on 1.7u40 for several months now with 92GB
heaps and a worst-case cleanup pause time of around 370ms - prior to tuning
the rset region entries, the cleanup phase was getting worse and worse over
time and in testing would sometimes be over 1 second.

I meant to dive into the OpenJDK code to look at where the default
RSetRegionEntries are calculated, but didn't get around to it.


Hope that helps,

Ryan Gardner


On Dec 31, 2013 8:29 PM, "yao" <yaoshengzhe at gmail.com> wrote:

> Hi Folks,
>
> Sorry for reporting GC performance result late, we are in the code freeze
> period for the holiday season and cannot do any production related
> deployment.
>
> First, I'd like to say thank you to Jenny, Monica and Thomas. Your
> suggestions are really helpful and help us to understand G1 GC behavior. We
> did NOT observe any full GCs after adjusting suggested parameters. That is
> really awesome, we tried these new parameters on Dec 26 and full GC
> disappeared since then (at least until I am writing this email, at 3:37pm
> EST, Dec 30).
>
> G1 parameters:
>
> *-XX:MaxGCPauseMillis=100 *-XX:G1HeapRegionSize=32m
>
> *-XX:InitiatingHeapOccupancyPercent=65 *-XX:G1ReservePercent=20
>
>
>
> *-XX:G1HeapWastePercent=5 -XX:G1MixedGCLiveThresholdPercent=75 *
> We've reduced MaxGCPauseMillis to 100 since our real-time system is focus
> on low pause, if system cannot give response in 50 milliseconds, it's
> totally useless for the client. However, current read latency 99 percentile
> is still slightly higher than CMS machines but they are pretty close (14
> millis vs 12 millis). One thing we can do now is to increase heap size for
> G1 machines, for now, the heap size for G1 is only 90 percent of those CMS
> machines. This is because we observed our server process is killed by OOM
> killer on G1 machines and we decided to decrease heap size on G1 machines.
> Since G1ReservePercent was increased, we think it should be safe to
> increase G1 heap to be same as CMS machine. We believe it could make G1
> machine give us better performance because 40 percent of heap will be used
> for block cache.
>
> Thanks
> -Shengzhe
>
> G1 Logs
>
> 2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young)
> Desired survivor size 234881024 bytes, new threshold 14 (max 15)
> - age   1:   16447904 bytes,   16447904 total
> - age   2:   30614384 bytes,   47062288 total
> - age   3:   16122104 bytes,   63184392 total
> - age   4:   16542280 bytes,   79726672 total
> - age   5:   14249520 bytes,   93976192 total
> - age   6:   15187728 bytes,  109163920 total
> - age   7:   15073808 bytes,  124237728 total
> - age   8:   17903552 bytes,  142141280 total
> - age   9:   17031280 bytes,  159172560 total
> - age  10:   16854792 bytes,  176027352 total
> - age  11:   19192480 bytes,  195219832 total
> - age  12:   20491176 bytes,  215711008 total
> - age  13:   16367528 bytes,  232078536 total
> - age  14:   15536120 bytes,  247614656 total
>  308692.158: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 32768, predicted base time: 38.52 ms, remaining time: 61.48
> ms, target pause time: 100.00 ms]
>  308692.158: [G1Ergonomics (CSet Construction) add young regions to CSet,
> eden: 91 regions, survivors: 14 regions, predicted young region time: 27.76
> ms]
>  308692.158: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
> 91 regions, survivors: 14 regions, old: 0 regions, predicted pause time:
> 66.28 ms, target pause time: 100.00 ms]
>  308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent cycle
> initiation, reason: occupancy higher than threshold, occupancy: 52143587328
> bytes, allocation request: 0 bytes, threshold: 46172576125 bytes (65.00 %),
> source: end of GC]
> , 0.0749020 secs]
>    [Parallel Time: 53.9 ms, GC Workers: 18]
>       [GC Worker Start (ms): Min: 308692158.6, Avg: 308692159.0, Max:
> 308692159.4, Diff: 0.8]
>       [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: 2.4,
> Sum: 81.9]
>       [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, Sum:
> 209.0]
>          [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: 16, Sum:
> 405]
>       [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: 144.3]
>       [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: 0.2, Sum:
> 510.7]
>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.2]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
> 0.5]
>       [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, Diff: 0.8,
> Sum: 947.5]
>       [GC Worker End (ms): Min: 308692211.6, Avg: 308692211.7, Max:
> 308692211.7, Diff: 0.1]
>    [Code Root Fixup: 0.0 ms]
>    [Clear CT: 9.8 ms]
>    [Other: 11.1 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 2.4 ms]
>       [Ref Enq: 0.4 ms]
>       [Free CSet: 1.1 ms]
>    [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: 448.0M->416.0M Heap:
> 51.7G(66.2G)->48.9G(66.2G)]
>  [Times: user=1.07 sys=0.01, real=0.08 secs]
>  308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle,
> reason: concurrent cycle initiation requested]
> 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young) (initial-mark)
> Desired survivor size 268435456 bytes, new threshold 15 (max 15)
> - age   1:   17798336 bytes,   17798336 total
> - age   2:   15275456 bytes,   33073792 total
> - age   3:   27940176 bytes,   61013968 total
> - age   4:   15716648 bytes,   76730616 total
> - age   5:   16474656 bytes,   93205272 total
> - age   6:   14249232 bytes,  107454504 total
> - age   7:   15187536 bytes,  122642040 total
> - age   8:   15073808 bytes,  137715848 total
> - age   9:   17362752 bytes,  155078600 total
> - age  10:   17031280 bytes,  172109880 total
> - age  11:   16854792 bytes,  188964672 total
> - age  12:   19124800 bytes,  208089472 total
> - age  13:   20491176 bytes,  228580648 total
> - age  14:   16367528 bytes,  244948176 total
>  308697.313: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 31028, predicted base time: 37.87 ms, remaining time: 62.13
> ms, target pause time: 100.00 ms]
>  308697.313: [G1Ergonomics (CSet Construction) add young regions to CSet,
> eden: 113 regions, survivors: 13 regions, predicted young region time:
> 27.99 ms]
>  308697.313: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
> 113 regions, survivors: 13 regions, old: 0 regions, predicted pause time:
> 65.86 ms, target pause time: 100.00 ms]
> , 0.0724890 secs]
>    [Parallel Time: 51.9 ms, GC Workers: 18]
>       [GC Worker Start (ms): Min: 308697313.3, Avg: 308697313.7, Max:
> 308697314.0, Diff: 0.6]
>       [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, Diff: 12.3,
> Sum: 101.8]
>       [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, Sum:
> 166.9]
>          [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, Sum:
> 396]
>       [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, Sum: 152.3]
>       [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: 5.2, Sum:
> 487.0]
>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.0]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
> 0.6]
>       [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, Diff: 0.6,
> Sum: 909.5]
>       [GC Worker End (ms): Min: 308697364.2, Avg: 308697364.2, Max:
> 308697364.3, Diff: 0.1]
>    [Code Root Fixup: 0.0 ms]
>    [Clear CT: 9.9 ms]
>    [Other: 10.8 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 2.8 ms]
>       [Ref Enq: 0.4 ms]
>       [Free CSet: 0.9 ms]
>    [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: 416.0M->448.0M Heap:
> 52.5G(66.2G)->49.0G(66.2G)]
>  [Times: user=1.01 sys=0.00, real=0.07 secs]
> 2013-12-30T08:25:31.954-0500: 308697.385: [GC
> concurrent-root-region-scan-start]
> 2013-12-30T08:25:31.967-0500: 308697.398: [GC
> concurrent-root-region-scan-end, 0.0131710 secs]
> 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start]
> 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end,
> 4.5984140 secs]
> 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark
> 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 secs],
> 0.0659540 secs]
>  [Times: user=0.87 sys=0.00, real=0.06 secs]
> 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup 52G->52G(66G),
> 0.5487830 secs]
>  [Times: user=9.66 sys=0.06, real=0.54 secs]
> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-start]
> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-end,
> 0.0000480 secs]
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/d574df84/attachment-0001.html 

From yu.zhang at oracle.com  Thu Jan  2 10:49:58 2014
From: yu.zhang at oracle.com (YU ZHANG)
Date: Thu, 02 Jan 2014 10:49:58 -0800
Subject: G1 GC clean up time is too long
In-Reply-To: <CAEAKNo9v-zisw=Ds4cCMDz9VJHoEszSYmUFRXNcw1JGRccO33Q@mail.gmail.com>
References: <CA+FETEJmSHU-fR74p9LXDnS4aH5hjgP8JFuH21uYNu=yGVV1zA@mail.gmail.com>
	<52B5037C.8010704@servergy.com>
	<CA+FETE+00T4XxryusyNseq6GgQp2=_=Pdo16=mEzZsqQHGXTpQ@mail.gmail.com>
	<CAEAKNo9v-zisw=Ds4cCMDz9VJHoEszSYmUFRXNcw1JGRccO33Q@mail.gmail.com>
Message-ID: <52C5B4D6.8010908@oracle.com>

Ryan,

Please see my comments in line.

Thanks,
Jenny

On 1/2/2014 9:57 AM, Ryan Gardner wrote:
>
> I've also fought with cleanup times being long with a large heap and 
> G1. In my case, I was suspicious that the RSet coarsening was 
> increasing the time for GC Cleanups.
>
> If you have a way to test different settings in a non-production 
> environment, you could consider experimenting with:
>
>
> -XX:+UnlockExperimentalVMOptions
>
> -XX:G1RSetRegionEntries=4096
>
> and different values for the RSetRegionEntries - 4096 was a sweet spot 
> for me, but your application may behave differently.
>
> You can turn on:
>
> -XX:+UnlockDiagnosticVMOptions
>
> -XX:+G1SummarizeRSetStats
>
> -XX:G1SummarizeRSetStatsPeriod=20
>
> to get it to spit out what it is doing to get some more insight into 
> those times.
>
>
> The specific number of RSetRegionEntries I set (4096) was, in theory, 
> supposed to be close to what it was setting based on my region size 
> (also 32m) and number of regions- but it did not seem to be.
>
If G1RSetRegionEntries not set, it is decided by 
G1RSetRegionEntriesBase*(region_size_log_mb+1).
G1SetRegionEntriesBase is a constant(256). region_size_log_mb is related 
to heap region size(region_size_mb-20).

If  you have 92G heap, and 32m regions size, I guess the default value 
is bigger than 4096?
Assuming my guess was right, you decide to reduce the entries as not 
seeing 'coarsenings' in the G1SummarizeRSetStats output?  Did you see 
the cards for old or young regions increase as the clean up time 
increase?  Also in your log, when clean up time increase, is it update 
RS or scan RS?
>
> Also, if you have more memory available, I have found G1 to take the 
> extra memory and not increase pause times much. As you increase the 
> total heap size, the size of your smallest possible collection will 
> also increase since it sets it to a percentage of total heap... In my 
> case I was tuning an applicaiton that was a cache, so it had tons heap 
> space but wasn't churning it over much...
>
> I ended up going as low as:
>
> -XX:G1NewSizePercent=1
>
> to let G1 feel free to use as few regions as possible to achieve 
> smaller pause times.
>
G1NewSizePercent(default 5) allows G1 to allocate this percent of heap 
as young gen size.  Lowering it should results smaller young gen.  So 
the young gc pause is smaller.
>
> I've been running in production on 1.7u40 for several months now with 
> 92GB heaps and a worst-case cleanup pause time of around 370ms - prior 
> to tuning the rset region entries, the cleanup phase was getting worse 
> and worse over time and in testing would sometimes be over 1 second.
>
> I meant to dive into the OpenJDK code to look at where the default 
> RSetRegionEntries are calculated, but didn't get around to it.
>
>
> Hope that helps,
>
> Ryan Gardner
>
>
> On Dec 31, 2013 8:29 PM, "yao" <yaoshengzhe at gmail.com 
> <mailto:yaoshengzhe at gmail.com>> wrote:
>
>     Hi Folks,
>
>     Sorry for reporting GC performance result late, we are in the code
>     freeze period for the holiday season and cannot do any production
>     related deployment.
>
>     First, I'd like to say thank you to Jenny, Monica and Thomas. Your
>     suggestions are really helpful and help us to understand G1 GC
>     behavior. We did NOT observe any full GCs after adjusting
>     suggested parameters. That is really awesome, we tried these new
>     parameters on Dec 26 and full GC disappeared since then (at least
>     until I am writing this email, at 3:37pm EST, Dec 30).
>
>     G1 parameters:
>     *-XX:MaxGCPauseMillis=100
>     *-XX:G1HeapRegionSize=32m
>     *-XX:InitiatingHeapOccupancyPercent=65
>     *-XX:G1ReservePercent=20
>     *-XX:G1HeapWastePercent=5
>     -XX:G1MixedGCLiveThresholdPercent=75
>
>     *
>     We've reduced**MaxGCPauseMillis to 100 since our real-time system
>     is focus on low pause, if system cannot give response in 50
>     milliseconds, it's totally useless for the client. However,
>     current read latency 99 percentile is still slightly higher than
>     CMS machines but they are pretty close (14 millis vs 12 millis).
>     One thing we can do now is to increase heap size for G1 machines,
>     for now, the heap size for G1 is only 90 percent of those CMS
>     machines. This is because we observed our server process is killed
>     by OOM killer on G1 machines and we decided to decrease heap size
>     on G1 machines. Since G1ReservePercent was increased, we think it
>     should be safe to increase G1 heap to be same as CMS machine. We
>     believe it could make G1 machine give us better performance
>     because 40 percent of heap will be used for block cache.
>
>     Thanks
>     -Shengzhe
>
>     G1 Logs
>
>     2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young)
>     Desired survivor size 234881024 bytes, new threshold 14 (max 15)
>     - age   1:   16447904 bytes,   16447904 total
>     - age   2:   30614384 bytes,   47062288 total
>     - age   3:   16122104 bytes,   63184392 total
>     - age   4:   16542280 bytes,   79726672 total
>     - age   5:   14249520 bytes,   93976192 total
>     - age   6:   15187728 bytes,  109163920 total
>     - age   7:   15073808 bytes,  124237728 total
>     - age   8:   17903552 bytes,  142141280 total
>     - age   9:   17031280 bytes,  159172560 total
>     - age  10:   16854792 bytes,  176027352 total
>     - age  11:   19192480 bytes,  195219832 total
>     - age  12:   20491176 bytes,  215711008 total
>     - age  13:   16367528 bytes,  232078536 total
>     - age  14:   15536120 bytes,  247614656 total
>      308692.158: [G1Ergonomics (CSet Construction) start choosing
>     CSet, _pending_cards: 32768, predicted base time: 38.52 ms,
>     remaining time: 61.48 ms, target pause time: 100.00 ms]
>      308692.158: [G1Ergonomics (CSet Construction) add young regions
>     to CSet, eden: 91 regions, survivors: 14 regions, predicted young
>     region time: 27.76 ms]
>      308692.158: [G1Ergonomics (CSet Construction) finish choosing
>     CSet, eden: 91 regions, survivors: 14 regions, old: 0 regions,
>     predicted pause time: 66.28 ms, target pause time: 100.00 ms]
>      308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent
>     cycle initiation, reason: occupancy higher than threshold,
>     occupancy: 52143587328 bytes, allocation request: 0 bytes,
>     threshold: 46172576125 bytes (65.00 %), source: end of GC]
>     , 0.0749020 secs]
>        [Parallel Time: 53.9 ms, GC Workers: 18]
>           [GC Worker Start (ms): Min: 308692158.6 <tel:308692158.6>,
>     Avg: 308692159.0 <tel:308692159.0>, Max: 308692159.4
>     <tel:308692159.4>, Diff: 0.8]
>           [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff:
>     2.4, Sum: 81.9]
>           [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0,
>     Sum: 209.0]
>              [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff:
>     16, Sum: 405]
>           [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum:
>     144.3]
>           [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff:
>     0.2, Sum: 510.7]
>           [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1,
>     Sum: 1.2]
>           [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff:
>     0.1, Sum: 0.5]
>           [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1,
>     Diff: 0.8, Sum: 947.5]
>           [GC Worker End (ms): Min: 308692211.6 <tel:308692211.6>,
>     Avg: 308692211.7 <tel:308692211.7>, Max: 308692211.7
>     <tel:308692211.7>, Diff: 0.1]
>        [Code Root Fixup: 0.0 ms]
>        [Clear CT: 9.8 ms]
>        [Other: 11.1 ms]
>           [Choose CSet: 0.0 ms]
>           [Ref Proc: 2.4 ms]
>           [Ref Enq: 0.4 ms]
>           [Free CSet: 1.1 ms]
>        [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors:
>     448.0M->416.0M Heap: 51.7G(66.2G)->48.9G(66.2G)]
>      [Times: user=1.07 sys=0.01, real=0.08 secs]
>      308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent
>     cycle, reason: concurrent cycle initiation requested]
>     2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young)
>     (initial-mark)
>     Desired survivor size 268435456 bytes, new threshold 15 (max 15)
>     - age   1:   17798336 bytes,   17798336 total
>     - age   2:   15275456 bytes,   33073792 total
>     - age   3:   27940176 bytes,   61013968 total
>     - age   4:   15716648 bytes,   76730616 total
>     - age   5:   16474656 bytes,   93205272 total
>     - age   6:   14249232 bytes,  107454504 total
>     - age   7:   15187536 bytes,  122642040 total
>     - age   8:   15073808 bytes,  137715848 total
>     - age   9:   17362752 bytes,  155078600 total
>     - age  10:   17031280 bytes,  172109880 total
>     - age  11:   16854792 bytes,  188964672 total
>     - age  12:   19124800 bytes,  208089472 total
>     - age  13:   20491176 bytes,  228580648 total
>     - age  14:   16367528 bytes,  244948176 total
>      308697.313: [G1Ergonomics (CSet Construction) start choosing
>     CSet, _pending_cards: 31028, predicted base time: 37.87 ms,
>     remaining time: 62.13 ms, target pause time: 100.00 ms]
>      308697.313: [G1Ergonomics (CSet Construction) add young regions
>     to CSet, eden: 113 regions, survivors: 13 regions, predicted young
>     region time: 27.99 ms]
>      308697.313: [G1Ergonomics (CSet Construction) finish choosing
>     CSet, eden: 113 regions, survivors: 13 regions, old: 0 regions,
>     predicted pause time: 65.86 ms, target pause time: 100.00 ms]
>     , 0.0724890 secs]
>        [Parallel Time: 51.9 ms, GC Workers: 18]
>           [GC Worker Start (ms): Min: 308697313.3 <tel:308697313.3>,
>     Avg: 308697313.7 <tel:308697313.7>, Max: 308697314.0
>     <tel:308697314.0>, Diff: 0.6]
>           [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7,
>     Diff: 12.3, Sum: 101.8]
>           [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4,
>     Sum: 166.9]
>              [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30,
>     Sum: 396]
>           [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5,
>     Sum: 152.3]
>           [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff:
>     5.2, Sum: 487.0]
>           [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1,
>     Sum: 1.0]
>           [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff:
>     0.1, Sum: 0.6]
>           [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9,
>     Diff: 0.6, Sum: 909.5]
>           [GC Worker End (ms): Min: 308697364.2 <tel:308697364.2>,
>     Avg: 308697364.2 <tel:308697364.2>, Max: 308697364.3
>     <tel:308697364.3>, Diff: 0.1]
>        [Code Root Fixup: 0.0 ms]
>        [Clear CT: 9.9 ms]
>        [Other: 10.8 ms]
>           [Choose CSet: 0.0 ms]
>           [Ref Proc: 2.8 ms]
>           [Ref Enq: 0.4 ms]
>           [Free CSet: 0.9 ms]
>        [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors:
>     416.0M->448.0M Heap: 52.5G(66.2G)->49.0G(66.2G)]
>      [Times: user=1.01 sys=0.00, real=0.07 secs]
>     2013-12-30T08:25:31.954-0500: 308697.385: [GC
>     concurrent-root-region-scan-start]
>     2013-12-30T08:25:31.967-0500: 308697.398: [GC
>     concurrent-root-region-scan-end, 0.0131710 secs]
>     2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start]
>     2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end,
>     4.5984140 secs]
>     2013-12-30T08:25:36.570-0500: 308702.002: [GC remark
>     2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990
>     secs], 0.0659540 secs]
>      [Times: user=0.87 sys=0.00, real=0.06 secs]
>     2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup
>     52G->52G(66G), 0.5487830 secs]
>      [Times: user=9.66 sys=0.06, real=0.54 secs]
>     2013-12-30T08:25:37.190-0500: 308702.622: [GC
>     concurrent-cleanup-start]
>     2013-12-30T08:25:37.190-0500: 308702.622: [GC
>     concurrent-cleanup-end, 0.0000480 secs]
>
>
>
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/763edb09/attachment.html 

From yaoshengzhe at gmail.com  Thu Jan  2 14:35:10 2014
From: yaoshengzhe at gmail.com (yao)
Date: Thu, 2 Jan 2014 14:35:10 -0800
Subject: G1 GC clean up time is too long
In-Reply-To: <52C5B4D6.8010908@oracle.com>
References: <CA+FETEJmSHU-fR74p9LXDnS4aH5hjgP8JFuH21uYNu=yGVV1zA@mail.gmail.com>
	<52B5037C.8010704@servergy.com>
	<CA+FETE+00T4XxryusyNseq6GgQp2=_=Pdo16=mEzZsqQHGXTpQ@mail.gmail.com>
	<CAEAKNo9v-zisw=Ds4cCMDz9VJHoEszSYmUFRXNcw1JGRccO33Q@mail.gmail.com>
	<52C5B4D6.8010908@oracle.com>
Message-ID: <CA+FETEJfSCJCOA03S+=ENPrYO+Zi+C966Mh0Enr5t41E6WSs0A@mail.gmail.com>

Hi Ryan,

I've enabled gc logging options you mentioned and it looks like rset
coarsenings is a problem for large gc clean up time. I will take your
suggestions and try different G1RSetRegionEntries values. Thank you very
much.

Happy New Year
-Shengzhe

*Typical RSet Log*
 Concurrent RS processed 184839720 cards
  Of 960997 completed buffers:
       930426 ( 96.8%) by conc RS threads.
        30571 (  3.2%) by mutator threads.
  Conc RS threads times(s)
          0.00     0.00     0.00     0.00     0.00     0.00     0.00
0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
0.00     0.00     0.00    \
 0.00
  Total heap region rem set sizes = 5256086K.  Max = 8640K.
  Static structures = 347K, free_lists = 7420K.
    1166427614 occupied cards represented.
    Max size region =
296:(O)[0x00007fc7a6000000,0x00007fc7a8000000,0x00007fc7a8000000], size =
8641K, occupied = 1797K.
    Did 25790 coarsenings.

Output of *$ cat gc-hbase-1388692019.log | grep "coarsenings\|\(GC
cleanup\)"*
    Did 0 coarsenings.
    Did 0 coarsenings.
    Did 0 coarsenings.
    Did 0 coarsenings.
    Did 0 coarsenings.
    Did 0 coarsenings.
    Did 72 coarsenings.
    Did 224 coarsenings.
2014-01-02T15:12:03.031-0500: 1452.619: [GC cleanup 44G->43G(66G),
0.0376940 secs]
    Did 1015 coarsenings.
    Did 1476 coarsenings.
    Did 2210 coarsenings.
2014-01-02T15:25:37.483-0500: 2267.070: [GC cleanup 43G->42G(66G),
0.0539190 secs]
    Did 4123 coarsenings.
    Did 4817 coarsenings.
    Did 5362 coarsenings.
2014-01-02T15:40:19.499-0500: 3149.087: [GC cleanup 44G->42G(66G),
0.0661880 secs]
    Did 6316 coarsenings.
    Did 6842 coarsenings.
    Did 7213 coarsenings.
2014-01-02T15:54:42.812-0500: 4012.400: [GC cleanup 43G->42G(66G),
0.0888960 secs]
    Did 7458 coarsenings.
    Did 7739 coarsenings.
    Did 8214 coarsenings.
2014-01-02T16:09:04.009-0500: 4873.597: [GC cleanup 44G->43G(66G),
0.1171540 secs]
    Did 8958 coarsenings.
    Did 8973 coarsenings.
    Did 9056 coarsenings.
    Did 9543 coarsenings.
2014-01-02T16:23:51.359-0500: 5760.947: [GC cleanup 44G->43G(66G),
0.1526980 secs]
    Did 9561 coarsenings.
    Did 9873 coarsenings.
    Did 10209 coarsenings.
2014-01-02T16:39:04.462-0500: 6674.050: [GC cleanup 44G->43G(66G),
0.1923330 secs]
    Did 10599 coarsenings.
    Did 10849 coarsenings.
    Did 11178 coarsenings.
2014-01-02T16:46:57.445-0500: 7147.033: [GC cleanup 44G->44G(66G),
0.2353640 secs]
    Did 11746 coarsenings.
    Did 12701 coarsenings.
2014-01-02T16:53:17.536-0500: 7527.124: [GC cleanup 44G->44G(66G),
0.3489450 secs]
    Did 13272 coarsenings.
    Did 14682 coarsenings.
2014-01-02T16:58:00.726-0500: 7810.314: [GC cleanup 44G->44G(66G),
0.4271240 secs]
    Did 16630 coarsenings.
2014-01-02T17:01:37.077-0500: 8026.664: [GC cleanup 44G->44G(66G),
0.5089060 secs]
    Did 17612 coarsenings.
    Did 21654 coarsenings.
2014-01-02T17:06:02.566-0500: 8292.154: [GC cleanup 44G->44G(66G),
0.5531680 secs]
    Did 23774 coarsenings.
    Did 24074 coarsenings.
2014-01-02T17:11:24.795-0500: 8614.383: [GC cleanup 44G->44G(66G),
0.5290600 secs]
    Did 24768 coarsenings.
2014-01-02T17:17:23.219-0500: 8972.807: [GC cleanup 44G->44G(66G),
0.5382620 secs]
    Did 25790 coarsenings.
    Did 27047 coarsenings.
2014-01-02T17:23:00.551-0500: 9310.139: [GC cleanup 45G->44G(66G),
0.5107910 secs]
    Did 28558 coarsenings.
2014-01-02T17:28:22.157-0500: 9631.745: [GC cleanup 45G->44G(66G),
0.4902690 secs]
    Did 29272 coarsenings.
    Did 29335 coarsenings.


On Thu, Jan 2, 2014 at 10:49 AM, YU ZHANG <yu.zhang at oracle.com> wrote:

>  Ryan,
>
> Please see my comments in line.
>
> Thanks,
> Jenny
>
> On 1/2/2014 9:57 AM, Ryan Gardner wrote:
>
>  I've also fought with cleanup times being long with a large heap and G1.
> In my case, I was suspicious that the RSet coarsening was increasing the
> time for GC Cleanups.
>
> If you have a way to test different settings in a non-production
> environment, you could consider experimenting with:
>
>
>  -XX:+UnlockExperimentalVMOptions
>
> -XX:G1RSetRegionEntries=4096
>
> and different values for the RSetRegionEntries - 4096 was a sweet spot for
> me, but your application may behave differently.
>
> You can turn on:
>
> -XX:+UnlockDiagnosticVMOptions
>
> -XX:+G1SummarizeRSetStats
>
> -XX:G1SummarizeRSetStatsPeriod=20
>
> to get it to spit out what it is doing to get some more insight into those
> times.
>
>
>  The specific number of RSetRegionEntries I set (4096) was, in theory,
> supposed to be close to what it was setting based on my region size (also
> 32m) and number of regions- but it did not seem to be.
>
> If G1RSetRegionEntries not set, it is decided by
> G1RSetRegionEntriesBase*(region_size_log_mb+1).
> G1SetRegionEntriesBase is a constant(256). region_size_log_mb is related
> to heap region size(region_size_mb-20).
>
> If  you have 92G heap, and 32m regions size, I guess the default value is
> bigger than 4096?
> Assuming my guess was right, you decide to reduce the entries as not
> seeing 'coarsenings' in the G1SummarizeRSetStats output?  Did you see the
> cards for old or young regions increase as the clean up time increase?
> Also in your log, when clean up time increase, is it update RS or scan RS?
>
>  Also, if you have more memory available, I have found G1 to take the
> extra memory and not increase pause times much. As you increase the total
> heap size, the size of your smallest possible collection will also increase
> since it sets it to a percentage of total heap... In my case I was tuning
> an applicaiton that was a cache, so it had tons heap space but wasn't
> churning it over much...
>
> I ended up going as low as:
>
> -XX:G1NewSizePercent=1
>
> to let G1 feel free to use as few regions as possible to achieve smaller
> pause times.
>
> G1NewSizePercent(default 5) allows G1 to allocate this percent of heap as
> young gen size.  Lowering it should results smaller young gen.  So the
> young gc pause is smaller.
>
>  I've been running in production on 1.7u40 for several months now with
> 92GB heaps and a worst-case cleanup pause time of around 370ms - prior to
> tuning the rset region entries, the cleanup phase was getting worse and
> worse over time and in testing would sometimes be over 1 second.
>
> I meant to dive into the OpenJDK code to look at where the default
> RSetRegionEntries are calculated, but didn't get around to it.
>
>
>  Hope that helps,
>
> Ryan Gardner
>
>
>  On Dec 31, 2013 8:29 PM, "yao" <yaoshengzhe at gmail.com> wrote:
>
>>   Hi Folks,
>>
>>  Sorry for reporting GC performance result late, we are in the code
>> freeze period for the holiday season and cannot do any production related
>> deployment.
>>
>> First, I'd like to say thank you to Jenny, Monica and Thomas. Your
>> suggestions are really helpful and help us to understand G1 GC behavior. We
>> did NOT observe any full GCs after adjusting suggested parameters. That is
>> really awesome, we tried these new parameters on Dec 26 and full GC
>> disappeared since then (at least until I am writing this email, at 3:37pm
>> EST, Dec 30).
>>
>> G1 parameters:
>>
>> *-XX:MaxGCPauseMillis=100  *-XX:G1HeapRegionSize=32m
>>
>> *-XX:InitiatingHeapOccupancyPercent=65  *-XX:G1ReservePercent=20
>>
>>
>>
>> *-XX:G1HeapWastePercent=5  -XX:G1MixedGCLiveThresholdPercent=75 *
>>  We've reduced MaxGCPauseMillis to 100 since our real-time system is
>> focus on low pause, if system cannot give response in 50 milliseconds, it's
>> totally useless for the client. However, current read latency 99 percentile
>> is still slightly higher than CMS machines but they are pretty close (14
>> millis vs 12 millis). One thing we can do now is to increase heap size for
>> G1 machines, for now, the heap size for G1 is only 90 percent of those CMS
>> machines. This is because we observed our server process is killed by OOM
>> killer on G1 machines and we decided to decrease heap size on G1 machines.
>> Since G1ReservePercent was increased, we think it should be safe to
>> increase G1 heap to be same as CMS machine. We believe it could make G1
>> machine give us better performance because 40 percent of heap will be used
>> for block cache.
>>
>>  Thanks
>>  -Shengzhe
>>
>>  G1 Logs
>>
>>  2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young)
>> Desired survivor size 234881024 bytes, new threshold 14 (max 15)
>> - age   1:   16447904 bytes,   16447904 total
>> - age   2:   30614384 bytes,   47062288 total
>> - age   3:   16122104 bytes,   63184392 total
>> - age   4:   16542280 bytes,   79726672 total
>> - age   5:   14249520 bytes,   93976192 total
>> - age   6:   15187728 bytes,  109163920 total
>> - age   7:   15073808 bytes,  124237728 total
>> - age   8:   17903552 bytes,  142141280 total
>> - age   9:   17031280 bytes,  159172560 total
>> - age  10:   16854792 bytes,  176027352 total
>> - age  11:   19192480 bytes,  195219832 total
>> - age  12:   20491176 bytes,  215711008 total
>> - age  13:   16367528 bytes,  232078536 total
>> - age  14:   15536120 bytes,  247614656 total
>>  308692.158: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 32768, predicted base time: 38.52 ms, remaining time: 61.48
>> ms, target pause time: 100.00 ms]
>>  308692.158: [G1Ergonomics (CSet Construction) add young regions to CSet,
>> eden: 91 regions, survivors: 14 regions, predicted young region time: 27.76
>> ms]
>>  308692.158: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 91 regions, survivors: 14 regions, old: 0 regions, predicted pause
>> time: 66.28 ms, target pause time: 100.00 ms]
>>  308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent cycle
>> initiation, reason: occupancy higher than threshold, occupancy: 52143587328
>> bytes, allocation request: 0 bytes, threshold: 46172576125 bytes (65.00 %),
>> source: end of GC]
>> , 0.0749020 secs]
>>    [Parallel Time: 53.9 ms, GC Workers: 18]
>>       [GC Worker Start (ms): Min: 308692158.6, Avg: 308692159.0, Max:
>> 308692159.4, Diff: 0.8]
>>       [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: 2.4,
>> Sum: 81.9]
>>       [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, Sum:
>> 209.0]
>>          [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: 16, Sum:
>> 405]
>>       [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: 144.3]
>>       [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: 0.2, Sum:
>> 510.7]
>>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
>> 1.2]
>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>> Sum: 0.5]
>>       [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, Diff: 0.8,
>> Sum: 947.5]
>>       [GC Worker End (ms): Min: 308692211.6, Avg: 308692211.7, Max:
>> 308692211.7, Diff: 0.1]
>>    [Code Root Fixup: 0.0 ms]
>>    [Clear CT: 9.8 ms]
>>    [Other: 11.1 ms]
>>       [Choose CSet: 0.0 ms]
>>       [Ref Proc: 2.4 ms]
>>       [Ref Enq: 0.4 ms]
>>        [Free CSet: 1.1 ms]
>>    [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: 448.0M->416.0M Heap:
>> 51.7G(66.2G)->48.9G(66.2G)]
>>  [Times: user=1.07 sys=0.01, real=0.08 secs]
>>  308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle,
>> reason: concurrent cycle initiation requested]
>> 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young) (initial-mark)
>> Desired survivor size 268435456 bytes, new threshold 15 (max 15)
>> - age   1:   17798336 bytes,   17798336 total
>>  - age   2:   15275456 bytes,   33073792 total
>> - age   3:   27940176 bytes,   61013968 total
>> - age   4:   15716648 bytes,   76730616 total
>> - age   5:   16474656 bytes,   93205272 total
>>  - age   6:   14249232 bytes,  107454504 total
>> - age   7:   15187536 bytes,  122642040 total
>> - age   8:   15073808 bytes,  137715848 total
>> - age   9:   17362752 bytes,  155078600 total
>>  - age  10:   17031280 bytes,  172109880 total
>> - age  11:   16854792 bytes,  188964672 total
>> - age  12:   19124800 bytes,  208089472 total
>> - age  13:   20491176 bytes,  228580648 total
>>  - age  14:   16367528 bytes,  244948176 total
>>  308697.313: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 31028, predicted base time: 37.87 ms, remaining time: 62.13
>> ms, target pause time: 100.00 ms]
>>  308697.313: [G1Ergonomics (CSet Construction) add young regions to CSet,
>> eden: 113 regions, survivors: 13 regions, predicted young region time:
>> 27.99 ms]
>>  308697.313: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 113 regions, survivors: 13 regions, old: 0 regions, predicted pause
>> time: 65.86 ms, target pause time: 100.00 ms]
>> , 0.0724890 secs]
>>    [Parallel Time: 51.9 ms, GC Workers: 18]
>>       [GC Worker Start (ms): Min: 308697313.3, Avg: 308697313.7, Max:
>> 308697314.0, Diff: 0.6]
>>       [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, Diff: 12.3,
>> Sum: 101.8]
>>       [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, Sum:
>> 166.9]
>>          [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, Sum:
>> 396]
>>       [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, Sum: 152.3]
>>       [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: 5.2, Sum:
>> 487.0]
>>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
>> 1.0]
>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>> Sum: 0.6]
>>       [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, Diff: 0.6,
>> Sum: 909.5]
>>       [GC Worker End (ms): Min: 308697364.2, Avg: 308697364.2, Max:
>> 308697364.3, Diff: 0.1]
>>    [Code Root Fixup: 0.0 ms]
>>    [Clear CT: 9.9 ms]
>>    [Other: 10.8 ms]
>>       [Choose CSet: 0.0 ms]
>>       [Ref Proc: 2.8 ms]
>>       [Ref Enq: 0.4 ms]
>>        [Free CSet: 0.9 ms]
>>    [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: 416.0M->448.0M Heap:
>> 52.5G(66.2G)->49.0G(66.2G)]
>>  [Times: user=1.01 sys=0.00, real=0.07 secs]
>> 2013-12-30T08:25:31.954-0500: 308697.385: [GC
>> concurrent-root-region-scan-start]
>> 2013-12-30T08:25:31.967-0500: 308697.398: [GC
>> concurrent-root-region-scan-end, 0.0131710 secs]
>> 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start]
>> 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end,
>> 4.5984140 secs]
>> 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark
>> 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 secs],
>> 0.0659540 secs]
>>  [Times: user=0.87 sys=0.00, real=0.06 secs]
>> 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup 52G->52G(66G),
>> 0.5487830 secs]
>>  [Times: user=9.66 sys=0.06, real=0.54 secs]
>> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-start]
>> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-end,
>> 0.0000480 secs]
>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/5d821f95/attachment.html 

From ryebrye at gmail.com  Thu Jan  2 15:13:54 2014
From: ryebrye at gmail.com (Ryan Gardner)
Date: Thu, 2 Jan 2014 18:13:54 -0500
Subject: G1 GC clean up time is too long
In-Reply-To: <CA+FETEJfSCJCOA03S+=ENPrYO+Zi+C966Mh0Enr5t41E6WSs0A@mail.gmail.com>
References: <CA+FETEJmSHU-fR74p9LXDnS4aH5hjgP8JFuH21uYNu=yGVV1zA@mail.gmail.com>
	<52B5037C.8010704@servergy.com>
	<CA+FETE+00T4XxryusyNseq6GgQp2=_=Pdo16=mEzZsqQHGXTpQ@mail.gmail.com>
	<CAEAKNo9v-zisw=Ds4cCMDz9VJHoEszSYmUFRXNcw1JGRccO33Q@mail.gmail.com>
	<52C5B4D6.8010908@oracle.com>
	<CA+FETEJfSCJCOA03S+=ENPrYO+Zi+C966Mh0Enr5t41E6WSs0A@mail.gmail.com>
Message-ID: <CAEAKNo-8933WXD7FmrQ-h4Dfsg5GaJDK-ZxWpCauMowm5+Nb3A@mail.gmail.com>

Be sure to try different values... 4196 for me was half the size of the
default value yet yielded far fewer coarsenings (which made no sense to me
at the time)

I'm going to try to dig up my logs from my tuning earlier to reply to the
previous email - it was a few months ago so the specifics aren't fresh in
my mind.

I seem to remember that there was a slight tradeoff for rset scanning. I
tried 1024, 2048, 4096, 8192 and the sweet spot for me was 4096.

Let me know what you find. I'm curious to see if your results match mine.

Ryan
On Jan 2, 2014 5:35 PM, "yao" <yaoshengzhe at gmail.com> wrote:

> Hi Ryan,
>
> I've enabled gc logging options you mentioned and it looks like rset
> coarsenings is a problem for large gc clean up time. I will take your
> suggestions and try different G1RSetRegionEntries values. Thank you very
> much.
>
> Happy New Year
> -Shengzhe
>
> *Typical RSet Log*
>  Concurrent RS processed 184839720 cards
>   Of 960997 completed buffers:
>        930426 ( 96.8%) by conc RS threads.
>         30571 (  3.2%) by mutator threads.
>   Conc RS threads times(s)
>           0.00     0.00     0.00     0.00     0.00     0.00     0.00
> 0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
> 0.00     0.00     0.00    \
>  0.00
>   Total heap region rem set sizes = 5256086K.  Max = 8640K.
>   Static structures = 347K, free_lists = 7420K.
>     1166427614 occupied cards represented.
>     Max size region =
> 296:(O)[0x00007fc7a6000000,0x00007fc7a8000000,0x00007fc7a8000000], size =
> 8641K, occupied = 1797K.
>     Did 25790 coarsenings.
>
> Output of *$ cat gc-hbase-1388692019.log | grep "coarsenings\|\(GC
> cleanup\)"*
>     Did 0 coarsenings.
>     Did 0 coarsenings.
>     Did 0 coarsenings.
>     Did 0 coarsenings.
>     Did 0 coarsenings.
>     Did 0 coarsenings.
>     Did 72 coarsenings.
>     Did 224 coarsenings.
> 2014-01-02T15:12:03.031-0500: 1452.619: [GC cleanup 44G->43G(66G),
> 0.0376940 secs]
>     Did 1015 coarsenings.
>     Did 1476 coarsenings.
>     Did 2210 coarsenings.
> 2014-01-02T15:25:37.483-0500: 2267.070: [GC cleanup 43G->42G(66G),
> 0.0539190 secs]
>     Did 4123 coarsenings.
>     Did 4817 coarsenings.
>     Did 5362 coarsenings.
> 2014-01-02T15:40:19.499-0500: 3149.087: [GC cleanup 44G->42G(66G),
> 0.0661880 secs]
>     Did 6316 coarsenings.
>     Did 6842 coarsenings.
>     Did 7213 coarsenings.
> 2014-01-02T15:54:42.812-0500: 4012.400: [GC cleanup 43G->42G(66G),
> 0.0888960 secs]
>     Did 7458 coarsenings.
>     Did 7739 coarsenings.
>     Did 8214 coarsenings.
> 2014-01-02T16:09:04.009-0500: 4873.597: [GC cleanup 44G->43G(66G),
> 0.1171540 secs]
>     Did 8958 coarsenings.
>     Did 8973 coarsenings.
>     Did 9056 coarsenings.
>     Did 9543 coarsenings.
> 2014-01-02T16:23:51.359-0500: 5760.947: [GC cleanup 44G->43G(66G),
> 0.1526980 secs]
>     Did 9561 coarsenings.
>     Did 9873 coarsenings.
>     Did 10209 coarsenings.
> 2014-01-02T16:39:04.462-0500: 6674.050: [GC cleanup 44G->43G(66G),
> 0.1923330 secs]
>     Did 10599 coarsenings.
>     Did 10849 coarsenings.
>     Did 11178 coarsenings.
> 2014-01-02T16:46:57.445-0500: 7147.033: [GC cleanup 44G->44G(66G),
> 0.2353640 secs]
>     Did 11746 coarsenings.
>     Did 12701 coarsenings.
> 2014-01-02T16:53:17.536-0500: 7527.124: [GC cleanup 44G->44G(66G),
> 0.3489450 secs]
>     Did 13272 coarsenings.
>     Did 14682 coarsenings.
> 2014-01-02T16:58:00.726-0500: 7810.314: [GC cleanup 44G->44G(66G),
> 0.4271240 secs]
>     Did 16630 coarsenings.
> 2014-01-02T17:01:37.077-0500: 8026.664: [GC cleanup 44G->44G(66G),
> 0.5089060 secs]
>     Did 17612 coarsenings.
>     Did 21654 coarsenings.
> 2014-01-02T17:06:02.566-0500: 8292.154: [GC cleanup 44G->44G(66G),
> 0.5531680 secs]
>     Did 23774 coarsenings.
>     Did 24074 coarsenings.
> 2014-01-02T17:11:24.795-0500: 8614.383: [GC cleanup 44G->44G(66G),
> 0.5290600 secs]
>     Did 24768 coarsenings.
> 2014-01-02T17:17:23.219-0500: 8972.807: [GC cleanup 44G->44G(66G),
> 0.5382620 secs]
>     Did 25790 coarsenings.
>     Did 27047 coarsenings.
> 2014-01-02T17:23:00.551-0500: 9310.139: [GC cleanup 45G->44G(66G),
> 0.5107910 secs]
>     Did 28558 coarsenings.
> 2014-01-02T17:28:22.157-0500: 9631.745: [GC cleanup 45G->44G(66G),
> 0.4902690 secs]
>     Did 29272 coarsenings.
>     Did 29335 coarsenings.
>
>
> On Thu, Jan 2, 2014 at 10:49 AM, YU ZHANG <yu.zhang at oracle.com> wrote:
>
>>  Ryan,
>>
>> Please see my comments in line.
>>
>> Thanks,
>> Jenny
>>
>> On 1/2/2014 9:57 AM, Ryan Gardner wrote:
>>
>>  I've also fought with cleanup times being long with a large heap and
>> G1. In my case, I was suspicious that the RSet coarsening was increasing
>> the time for GC Cleanups.
>>
>> If you have a way to test different settings in a non-production
>> environment, you could consider experimenting with:
>>
>>
>>  -XX:+UnlockExperimentalVMOptions
>>
>> -XX:G1RSetRegionEntries=4096
>>
>> and different values for the RSetRegionEntries - 4096 was a sweet spot
>> for me, but your application may behave differently.
>>
>> You can turn on:
>>
>> -XX:+UnlockDiagnosticVMOptions
>>
>> -XX:+G1SummarizeRSetStats
>>
>> -XX:G1SummarizeRSetStatsPeriod=20
>>
>> to get it to spit out what it is doing to get some more insight into
>> those times.
>>
>>
>>  The specific number of RSetRegionEntries I set (4096) was, in theory,
>> supposed to be close to what it was setting based on my region size (also
>> 32m) and number of regions- but it did not seem to be.
>>
>> If G1RSetRegionEntries not set, it is decided by
>> G1RSetRegionEntriesBase*(region_size_log_mb+1).
>> G1SetRegionEntriesBase is a constant(256). region_size_log_mb is related
>> to heap region size(region_size_mb-20).
>>
>> If  you have 92G heap, and 32m regions size, I guess the default value is
>> bigger than 4096?
>> Assuming my guess was right, you decide to reduce the entries as not
>> seeing 'coarsenings' in the G1SummarizeRSetStats output?  Did you see the
>> cards for old or young regions increase as the clean up time increase?
>> Also in your log, when clean up time increase, is it update RS or scan RS?
>>
>>  Also, if you have more memory available, I have found G1 to take the
>> extra memory and not increase pause times much. As you increase the total
>> heap size, the size of your smallest possible collection will also increase
>> since it sets it to a percentage of total heap... In my case I was tuning
>> an applicaiton that was a cache, so it had tons heap space but wasn't
>> churning it over much...
>>
>> I ended up going as low as:
>>
>> -XX:G1NewSizePercent=1
>>
>> to let G1 feel free to use as few regions as possible to achieve smaller
>> pause times.
>>
>> G1NewSizePercent(default 5) allows G1 to allocate this percent of heap as
>> young gen size.  Lowering it should results smaller young gen.  So the
>> young gc pause is smaller.
>>
>>  I've been running in production on 1.7u40 for several months now with
>> 92GB heaps and a worst-case cleanup pause time of around 370ms - prior to
>> tuning the rset region entries, the cleanup phase was getting worse and
>> worse over time and in testing would sometimes be over 1 second.
>>
>> I meant to dive into the OpenJDK code to look at where the default
>> RSetRegionEntries are calculated, but didn't get around to it.
>>
>>
>>  Hope that helps,
>>
>> Ryan Gardner
>>
>>
>>  On Dec 31, 2013 8:29 PM, "yao" <yaoshengzhe at gmail.com> wrote:
>>
>>>   Hi Folks,
>>>
>>>  Sorry for reporting GC performance result late, we are in the code
>>> freeze period for the holiday season and cannot do any production related
>>> deployment.
>>>
>>> First, I'd like to say thank you to Jenny, Monica and Thomas. Your
>>> suggestions are really helpful and help us to understand G1 GC behavior. We
>>> did NOT observe any full GCs after adjusting suggested parameters. That is
>>> really awesome, we tried these new parameters on Dec 26 and full GC
>>> disappeared since then (at least until I am writing this email, at 3:37pm
>>> EST, Dec 30).
>>>
>>> G1 parameters:
>>>
>>> *-XX:MaxGCPauseMillis=100  *-XX:G1HeapRegionSize=32m
>>>
>>> *-XX:InitiatingHeapOccupancyPercent=65  *-XX:G1ReservePercent=20
>>>
>>>
>>>
>>> *-XX:G1HeapWastePercent=5  -XX:G1MixedGCLiveThresholdPercent=75 *
>>>  We've reduced MaxGCPauseMillis to 100 since our real-time system is
>>> focus on low pause, if system cannot give response in 50 milliseconds, it's
>>> totally useless for the client. However, current read latency 99 percentile
>>> is still slightly higher than CMS machines but they are pretty close (14
>>> millis vs 12 millis). One thing we can do now is to increase heap size for
>>> G1 machines, for now, the heap size for G1 is only 90 percent of those CMS
>>> machines. This is because we observed our server process is killed by OOM
>>> killer on G1 machines and we decided to decrease heap size on G1 machines.
>>> Since G1ReservePercent was increased, we think it should be safe to
>>> increase G1 heap to be same as CMS machine. We believe it could make G1
>>> machine give us better performance because 40 percent of heap will be used
>>> for block cache.
>>>
>>>  Thanks
>>>  -Shengzhe
>>>
>>>  G1 Logs
>>>
>>>  2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young)
>>> Desired survivor size 234881024 bytes, new threshold 14 (max 15)
>>> - age   1:   16447904 bytes,   16447904 total
>>> - age   2:   30614384 bytes,   47062288 total
>>> - age   3:   16122104 bytes,   63184392 total
>>> - age   4:   16542280 bytes,   79726672 total
>>> - age   5:   14249520 bytes,   93976192 total
>>> - age   6:   15187728 bytes,  109163920 total
>>> - age   7:   15073808 bytes,  124237728 total
>>> - age   8:   17903552 bytes,  142141280 total
>>> - age   9:   17031280 bytes,  159172560 total
>>> - age  10:   16854792 bytes,  176027352 total
>>> - age  11:   19192480 bytes,  195219832 total
>>> - age  12:   20491176 bytes,  215711008 total
>>> - age  13:   16367528 bytes,  232078536 total
>>> - age  14:   15536120 bytes,  247614656 total
>>>  308692.158: [G1Ergonomics (CSet Construction) start choosing CSet,
>>> _pending_cards: 32768, predicted base time: 38.52 ms, remaining time: 61.48
>>> ms, target pause time: 100.00 ms]
>>>  308692.158: [G1Ergonomics (CSet Construction) add young regions to
>>> CSet, eden: 91 regions, survivors: 14 regions, predicted young region time:
>>> 27.76 ms]
>>>  308692.158: [G1Ergonomics (CSet Construction) finish choosing CSet,
>>> eden: 91 regions, survivors: 14 regions, old: 0 regions, predicted pause
>>> time: 66.28 ms, target pause time: 100.00 ms]
>>>  308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent cycle
>>> initiation, reason: occupancy higher than threshold, occupancy: 52143587328
>>> bytes, allocation request: 0 bytes, threshold: 46172576125 bytes (65.00 %),
>>> source: end of GC]
>>> , 0.0749020 secs]
>>>    [Parallel Time: 53.9 ms, GC Workers: 18]
>>>       [GC Worker Start (ms): Min: 308692158.6, Avg: 308692159.0, Max:
>>> 308692159.4, Diff: 0.8]
>>>       [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: 2.4,
>>> Sum: 81.9]
>>>       [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, Sum:
>>> 209.0]
>>>          [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: 16, Sum:
>>> 405]
>>>       [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: 144.3]
>>>       [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: 0.2,
>>> Sum: 510.7]
>>>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
>>> 1.2]
>>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>>> Sum: 0.5]
>>>       [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, Diff: 0.8,
>>> Sum: 947.5]
>>>       [GC Worker End (ms): Min: 308692211.6, Avg: 308692211.7, Max:
>>> 308692211.7, Diff: 0.1]
>>>    [Code Root Fixup: 0.0 ms]
>>>    [Clear CT: 9.8 ms]
>>>    [Other: 11.1 ms]
>>>       [Choose CSet: 0.0 ms]
>>>       [Ref Proc: 2.4 ms]
>>>       [Ref Enq: 0.4 ms]
>>>        [Free CSet: 1.1 ms]
>>>    [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: 448.0M->416.0M
>>> Heap: 51.7G(66.2G)->48.9G(66.2G)]
>>>  [Times: user=1.07 sys=0.01, real=0.08 secs]
>>>  308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent
>>> cycle, reason: concurrent cycle initiation requested]
>>> 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young)
>>> (initial-mark)
>>> Desired survivor size 268435456 bytes, new threshold 15 (max 15)
>>> - age   1:   17798336 bytes,   17798336 total
>>>  - age   2:   15275456 bytes,   33073792 total
>>> - age   3:   27940176 bytes,   61013968 total
>>> - age   4:   15716648 bytes,   76730616 total
>>> - age   5:   16474656 bytes,   93205272 total
>>>  - age   6:   14249232 bytes,  107454504 total
>>> - age   7:   15187536 bytes,  122642040 total
>>> - age   8:   15073808 bytes,  137715848 total
>>> - age   9:   17362752 bytes,  155078600 total
>>>  - age  10:   17031280 bytes,  172109880 total
>>> - age  11:   16854792 bytes,  188964672 total
>>> - age  12:   19124800 bytes,  208089472 total
>>> - age  13:   20491176 bytes,  228580648 total
>>>  - age  14:   16367528 bytes,  244948176 total
>>>  308697.313: [G1Ergonomics (CSet Construction) start choosing CSet,
>>> _pending_cards: 31028, predicted base time: 37.87 ms, remaining time: 62.13
>>> ms, target pause time: 100.00 ms]
>>>  308697.313: [G1Ergonomics (CSet Construction) add young regions to
>>> CSet, eden: 113 regions, survivors: 13 regions, predicted young region
>>> time: 27.99 ms]
>>>  308697.313: [G1Ergonomics (CSet Construction) finish choosing CSet,
>>> eden: 113 regions, survivors: 13 regions, old: 0 regions, predicted pause
>>> time: 65.86 ms, target pause time: 100.00 ms]
>>> , 0.0724890 secs]
>>>    [Parallel Time: 51.9 ms, GC Workers: 18]
>>>       [GC Worker Start (ms): Min: 308697313.3, Avg: 308697313.7, Max:
>>> 308697314.0, Diff: 0.6]
>>>       [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, Diff:
>>> 12.3, Sum: 101.8]
>>>       [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, Sum:
>>> 166.9]
>>>          [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, Sum:
>>> 396]
>>>       [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, Sum:
>>> 152.3]
>>>       [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: 5.2,
>>> Sum: 487.0]
>>>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
>>> 1.0]
>>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>>> Sum: 0.6]
>>>       [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, Diff: 0.6,
>>> Sum: 909.5]
>>>       [GC Worker End (ms): Min: 308697364.2, Avg: 308697364.2, Max:
>>> 308697364.3, Diff: 0.1]
>>>    [Code Root Fixup: 0.0 ms]
>>>    [Clear CT: 9.9 ms]
>>>    [Other: 10.8 ms]
>>>       [Choose CSet: 0.0 ms]
>>>       [Ref Proc: 2.8 ms]
>>>       [Ref Enq: 0.4 ms]
>>>        [Free CSet: 0.9 ms]
>>>    [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: 416.0M->448.0M
>>> Heap: 52.5G(66.2G)->49.0G(66.2G)]
>>>  [Times: user=1.01 sys=0.00, real=0.07 secs]
>>> 2013-12-30T08:25:31.954-0500: 308697.385: [GC
>>> concurrent-root-region-scan-start]
>>> 2013-12-30T08:25:31.967-0500: 308697.398: [GC
>>> concurrent-root-region-scan-end, 0.0131710 secs]
>>> 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start]
>>> 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end,
>>> 4.5984140 secs]
>>> 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark
>>> 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 secs],
>>> 0.0659540 secs]
>>>  [Times: user=0.87 sys=0.00, real=0.06 secs]
>>> 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup 52G->52G(66G),
>>> 0.5487830 secs]
>>>  [Times: user=9.66 sys=0.06, real=0.54 secs]
>>> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-start]
>>> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-end,
>>> 0.0000480 secs]
>>>
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/d97f0482/attachment-0001.html 

From ysr1729 at gmail.com  Fri Jan  3 01:12:02 2014
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 3 Jan 2014 01:12:02 -0800
Subject: G1: higher perm gen footprint or a possible perm gen leak?
Message-ID: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>

I haven't narrowed it down sufficiently yet, but has anyone noticed if G1
causes a higher perm gen footprint or, worse, a perm gen leak perhaps?
I do realize that G1 does not today (as of 7u40 at least) collect the perm
gen concurrently, rather deferring its collection to a stop-world full
gc. However, it has just come to my attention that despite full stop-world
gc's (on account of the perm gen getting full), G1 still uses more perm gen
space (in some instacnes substantially more) than ParallelOldGC even after
the full stop-world gc's, in some of our experiments. (PS: Also noticed
that the default gc logging for G1 does not print the perm gen usage at
full gc, unlike other collectors; looks like an oversight in logging
perhaps one
that has been fixed recently; i was on 7u40 i think.)

While I need to collect more data using non-ParallelOld, non-G1 collectors
(escpeially CMS) to see how things look and to get closer to the root
cause, I wondered if anyone else had come across a similar issue and to
check if this is a known issue.

I'll post more details after gathering more data, but in case anyone has
experienced this, please do share.

thank you in advance, and Happy New Year!
-- ramki
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/cffda9b9/attachment.html 

From wolfgang.pedot at finkzeit.at  Fri Jan  3 07:33:14 2014
From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot)
Date: Fri, 03 Jan 2014 16:33:14 +0100
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>
Message-ID: <52C6D83A.8070309@finkzeit.at>

Hi,

I am using G1 on 7u45 for an application-server which has a "healthy" 
permGen churn because it generates a lot of short-lived dynamic classes 
(JavaScript). Currently permGen is sized at a little over 1GB and 
depending on usage there can be up to 2 full GCs per day (usually only 
1). I have not noticed an increased permGen usage with G1 (increased 
size just before switching to G1) but I have noticed something odd about 
the permGen-usage after a collect. The class-count will always fall back 
to the same level which is currently 65k but the permGen usage after 
collect can either be ~0.8GB or ~0.55GB. There are always 3 collects 
resulting in 0.8GB followed by one scoring 0.55GB so there seems to be 
some kind of "rythm" going on. The full GCs are always triggered by 
permGen getting full and the loaded class count goes significantly 
higher after a 0.55GB collect (165k vs 125k) so I guess some classes 
just get unloaded later...

I can not tell if this behaviour is due to G1 or some other factor in 
this application but I do know that I have no leak because the 
after-collect values are fairly stable over weeks.

So I have not experienced this but am sharing anyway ;)

happy new year
Wolfgang

Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna:
> I haven't narrowed it down sufficiently yet, but has anyone noticed if
> G1 causes a higher perm gen footprint or, worse, a perm gen leak perhaps?
> I do realize that G1 does not today (as of 7u40 at least) collect the
> perm gen concurrently, rather deferring its collection to a stop-world full
> gc. However, it has just come to my attention that despite full
> stop-world gc's (on account of the perm gen getting full), G1 still uses
> more perm gen
> space (in some instacnes substantially more) than ParallelOldGC even
> after the full stop-world gc's, in some of our experiments. (PS: Also
> noticed
> that the default gc logging for G1 does not print the perm gen usage at
> full gc, unlike other collectors; looks like an oversight in logging
> perhaps one
> that has been fixed recently; i was on 7u40 i think.)
>
> While I need to collect more data using non-ParallelOld, non-G1
> collectors (escpeially CMS) to see how things look and to get closer to
> the root
> cause, I wondered if anyone else had come across a similar issue and to
> check if this is a known issue.
>
> I'll post more details after gathering more data, but in case anyone has
> experienced this, please do share.
>
> thank you in advance, and Happy New Year!
> -- ramki
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>


From jocf83 at gmail.com  Fri Jan  3 07:47:05 2014
From: jocf83 at gmail.com (Jose Otavio Carlomagno Filho)
Date: Fri, 3 Jan 2014 13:47:05 -0200
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <52C6D83A.8070309@finkzeit.at>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>
	<52C6D83A.8070309@finkzeit.at>
Message-ID: <CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>

We recently switched to G1 in our application and started experiencing this
type of behaviour too. Turns out G1 was not causing the problem, it was
only exposing it to us.

Our application would generate a large number of proxy classes and that
would cause the Perm Gen to fill up until a full GC was performed by G1.
When using ParallelOldGC, this would not happen because full GCs would be
executed much more frequently (when the old gen was full), which prevented
the perm gen from filling up.

You can find more info about our problem and our analysis here:
http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe

I recommend you use a profiling too to investigate the root cause of your
Perm Gen getting filled up. There's a chance it is a leak, but as I said,
in our case, it was our own application's fault and G1 exposed the problem
to us.

Regards,
Jose


On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot
<wolfgang.pedot at finkzeit.at>wrote:

> Hi,
>
> I am using G1 on 7u45 for an application-server which has a "healthy"
> permGen churn because it generates a lot of short-lived dynamic classes
> (JavaScript). Currently permGen is sized at a little over 1GB and
> depending on usage there can be up to 2 full GCs per day (usually only
> 1). I have not noticed an increased permGen usage with G1 (increased
> size just before switching to G1) but I have noticed something odd about
> the permGen-usage after a collect. The class-count will always fall back
> to the same level which is currently 65k but the permGen usage after
> collect can either be ~0.8GB or ~0.55GB. There are always 3 collects
> resulting in 0.8GB followed by one scoring 0.55GB so there seems to be
> some kind of "rythm" going on. The full GCs are always triggered by
> permGen getting full and the loaded class count goes significantly
> higher after a 0.55GB collect (165k vs 125k) so I guess some classes
> just get unloaded later...
>
> I can not tell if this behaviour is due to G1 or some other factor in
> this application but I do know that I have no leak because the
> after-collect values are fairly stable over weeks.
>
> So I have not experienced this but am sharing anyway ;)
>
> happy new year
> Wolfgang
>
> Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna:
> > I haven't narrowed it down sufficiently yet, but has anyone noticed if
> > G1 causes a higher perm gen footprint or, worse, a perm gen leak perhaps?
> > I do realize that G1 does not today (as of 7u40 at least) collect the
> > perm gen concurrently, rather deferring its collection to a stop-world
> full
> > gc. However, it has just come to my attention that despite full
> > stop-world gc's (on account of the perm gen getting full), G1 still uses
> > more perm gen
> > space (in some instacnes substantially more) than ParallelOldGC even
> > after the full stop-world gc's, in some of our experiments. (PS: Also
> > noticed
> > that the default gc logging for G1 does not print the perm gen usage at
> > full gc, unlike other collectors; looks like an oversight in logging
> > perhaps one
> > that has been fixed recently; i was on 7u40 i think.)
> >
> > While I need to collect more data using non-ParallelOld, non-G1
> > collectors (escpeially CMS) to see how things look and to get closer to
> > the root
> > cause, I wondered if anyone else had come across a similar issue and to
> > check if this is a known issue.
> >
> > I'll post more details after gathering more data, but in case anyone has
> > experienced this, please do share.
> >
> > thank you in advance, and Happy New Year!
> > -- ramki
> >
> >
> > _______________________________________________
> > hotspot-gc-use mailing list
> > hotspot-gc-use at openjdk.java.net
> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/4cbf11aa/attachment.html 

From yu.zhang at oracle.com  Fri Jan  3 10:05:26 2014
From: yu.zhang at oracle.com (YU ZHANG)
Date: Fri, 03 Jan 2014 10:05:26 -0800
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>
	<52C6D83A.8070309@finkzeit.at>
	<CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>
Message-ID: <52C6FBE6.6040904@oracle.com>

Very interesting post.  Like someone mentioned in the comments, with 
-XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled, CMS can clean 
classes in PermGen with minor GC.  But G1 can only unload class during 
full gc.  Full GC in G1 is slow as it is single threaded.

Thanks,
Jenny

On 1/3/2014 7:47 AM, Jose Otavio Carlomagno Filho wrote:
> We recently switched to G1 in our application and started experiencing 
> this type of behaviour too. Turns out G1 was not causing the problem, 
> it was only exposing it to us.
>
> Our application would generate a large number of proxy classes and 
> that would cause the Perm Gen to fill up until a full GC was performed 
> by G1. When using ParallelOldGC, this would not happen because full 
> GCs would be executed much more frequently (when the old gen was 
> full), which prevented the perm gen from filling up.
>
> You can find more info about our problem and our analysis here: 
> http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe
>
> I recommend you use a profiling too to investigate the root cause of 
> your Perm Gen getting filled up. There's a chance it is a leak, but as 
> I said, in our case, it was our own application's fault and G1 exposed 
> the problem to us.
>
> Regards,
> Jose
>
>
> On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot 
> <wolfgang.pedot at finkzeit.at <mailto:wolfgang.pedot at finkzeit.at>> wrote:
>
>     Hi,
>
>     I am using G1 on 7u45 for an application-server which has a "healthy"
>     permGen churn because it generates a lot of short-lived dynamic
>     classes
>     (JavaScript). Currently permGen is sized at a little over 1GB and
>     depending on usage there can be up to 2 full GCs per day (usually only
>     1). I have not noticed an increased permGen usage with G1 (increased
>     size just before switching to G1) but I have noticed something odd
>     about
>     the permGen-usage after a collect. The class-count will always
>     fall back
>     to the same level which is currently 65k but the permGen usage after
>     collect can either be ~0.8GB or ~0.55GB. There are always 3 collects
>     resulting in 0.8GB followed by one scoring 0.55GB so there seems to be
>     some kind of "rythm" going on. The full GCs are always triggered by
>     permGen getting full and the loaded class count goes significantly
>     higher after a 0.55GB collect (165k vs 125k) so I guess some classes
>     just get unloaded later...
>
>     I can not tell if this behaviour is due to G1 or some other factor in
>     this application but I do know that I have no leak because the
>     after-collect values are fairly stable over weeks.
>
>     So I have not experienced this but am sharing anyway ;)
>
>     happy new year
>     Wolfgang
>
>     Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna:
>     > I haven't narrowed it down sufficiently yet, but has anyone
>     noticed if
>     > G1 causes a higher perm gen footprint or, worse, a perm gen leak
>     perhaps?
>     > I do realize that G1 does not today (as of 7u40 at least)
>     collect the
>     > perm gen concurrently, rather deferring its collection to a
>     stop-world full
>     > gc. However, it has just come to my attention that despite full
>     > stop-world gc's (on account of the perm gen getting full), G1
>     still uses
>     > more perm gen
>     > space (in some instacnes substantially more) than ParallelOldGC even
>     > after the full stop-world gc's, in some of our experiments. (PS:
>     Also
>     > noticed
>     > that the default gc logging for G1 does not print the perm gen
>     usage at
>     > full gc, unlike other collectors; looks like an oversight in logging
>     > perhaps one
>     > that has been fixed recently; i was on 7u40 i think.)
>     >
>     > While I need to collect more data using non-ParallelOld, non-G1
>     > collectors (escpeially CMS) to see how things look and to get
>     closer to
>     > the root
>     > cause, I wondered if anyone else had come across a similar issue
>     and to
>     > check if this is a known issue.
>     >
>     > I'll post more details after gathering more data, but in case
>     anyone has
>     > experienced this, please do share.
>     >
>     > thank you in advance, and Happy New Year!
>     > -- ramki
>     >
>     >
>     > _______________________________________________
>     > hotspot-gc-use mailing list
>     > hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>     >
>
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/239d98fd/attachment.html 

From ysr1729 at gmail.com  Fri Jan  3 11:30:47 2014
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 3 Jan 2014 11:30:47 -0800
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <52C6FBE6.6040904@oracle.com>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>
	<52C6D83A.8070309@finkzeit.at>
	<CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>
	<52C6FBE6.6040904@oracle.com>
Message-ID: <CABzyjykfU1PWQBu6sLXuz4ycSLqZAk9URT4ePKgMq897Ai8xgw@mail.gmail.com>

Thanks everyone for sharing yr experiences. As I indicated, I do realize
that G1 does not collect perm gen concurrently.
What was surprising was that G1's use of perm gen was much higher following
its stop-world full gc's
which would have collected the perm gen. As a result, G1 needed a perm gen
quite a bit more than twice that
given to parallel gc to be able to run an application for a certain length
of time.

I'll provide more data on perm gen dynamics when I have it. My guess would
be that somehow G1's use of
regions in the perm gen is causing a dilation of perm gen footprint on
account of fragmentation in the G1 perm
gen regions. If that were the case, I would expect a modest increase in the
perm gen footprint, but it seemed the increase in
footprint was much higher. I'll collect and post more concrete numbers when
I get a chance.

-- ramki


On Fri, Jan 3, 2014 at 10:05 AM, YU ZHANG <yu.zhang at oracle.com> wrote:

>  Very interesting post.  Like someone mentioned in the comments, with
> -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled, CMS can clean
> classes in PermGen with minor GC.  But G1 can only unload class during full
> gc.  Full GC in G1 is slow as it is single threaded.
>
>  Thanks,
> Jenny
>
> On 1/3/2014 7:47 AM, Jose Otavio Carlomagno Filho wrote:
>
> We recently switched to G1 in our application and started experiencing
> this type of behaviour too. Turns out G1 was not causing the problem, it
> was only exposing it to us.
>
>  Our application would generate a large number of proxy classes and that
> would cause the Perm Gen to fill up until a full GC was performed by G1.
> When using ParallelOldGC, this would not happen because full GCs would be
> executed much more frequently (when the old gen was full), which prevented
> the perm gen from filling up.
>
>  You can find more info about our problem and our analysis here:
> http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe
>
>  I recommend you use a profiling too to investigate the root cause of
> your Perm Gen getting filled up. There's a chance it is a leak, but as I
> said, in our case, it was our own application's fault and G1 exposed the
> problem to us.
>
>  Regards,
> Jose
>
>
> On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot <wolfgang.pedot at finkzeit.at
> > wrote:
>
>> Hi,
>>
>> I am using G1 on 7u45 for an application-server which has a "healthy"
>> permGen churn because it generates a lot of short-lived dynamic classes
>> (JavaScript). Currently permGen is sized at a little over 1GB and
>> depending on usage there can be up to 2 full GCs per day (usually only
>> 1). I have not noticed an increased permGen usage with G1 (increased
>> size just before switching to G1) but I have noticed something odd about
>> the permGen-usage after a collect. The class-count will always fall back
>> to the same level which is currently 65k but the permGen usage after
>> collect can either be ~0.8GB or ~0.55GB. There are always 3 collects
>> resulting in 0.8GB followed by one scoring 0.55GB so there seems to be
>> some kind of "rythm" going on. The full GCs are always triggered by
>> permGen getting full and the loaded class count goes significantly
>> higher after a 0.55GB collect (165k vs 125k) so I guess some classes
>> just get unloaded later...
>>
>> I can not tell if this behaviour is due to G1 or some other factor in
>> this application but I do know that I have no leak because the
>> after-collect values are fairly stable over weeks.
>>
>> So I have not experienced this but am sharing anyway ;)
>>
>> happy new year
>> Wolfgang
>>
>> Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna:
>>  > I haven't narrowed it down sufficiently yet, but has anyone noticed if
>> > G1 causes a higher perm gen footprint or, worse, a perm gen leak
>> perhaps?
>> > I do realize that G1 does not today (as of 7u40 at least) collect the
>> > perm gen concurrently, rather deferring its collection to a stop-world
>> full
>> > gc. However, it has just come to my attention that despite full
>> > stop-world gc's (on account of the perm gen getting full), G1 still uses
>> > more perm gen
>> > space (in some instacnes substantially more) than ParallelOldGC even
>> > after the full stop-world gc's, in some of our experiments. (PS: Also
>> > noticed
>> > that the default gc logging for G1 does not print the perm gen usage at
>> > full gc, unlike other collectors; looks like an oversight in logging
>> > perhaps one
>> > that has been fixed recently; i was on 7u40 i think.)
>> >
>> > While I need to collect more data using non-ParallelOld, non-G1
>> > collectors (escpeially CMS) to see how things look and to get closer to
>> > the root
>> > cause, I wondered if anyone else had come across a similar issue and to
>> > check if this is a known issue.
>> >
>> > I'll post more details after gathering more data, but in case anyone has
>> > experienced this, please do share.
>> >
>> > thank you in advance, and Happy New Year!
>> > -- ramki
>> >
>> >
>>  > _______________________________________________
>> > hotspot-gc-use mailing list
>> > hotspot-gc-use at openjdk.java.net
>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/75e83946/attachment-0001.html 

From ysr1729 at gmail.com  Fri Jan  3 11:36:42 2014
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 3 Jan 2014 11:36:42 -0800
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <52C6FBE6.6040904@oracle.com>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>
	<52C6D83A.8070309@finkzeit.at>
	<CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>
	<52C6FBE6.6040904@oracle.com>
Message-ID: <CABzyjynjUkEYGaYJdrPr-67QrKEbhVpQ6zvqZ3PQLeRS-kDR2w@mail.gmail.com>

Hi Jenny --


On Fri, Jan 3, 2014 at 10:05 AM, YU ZHANG <yu.zhang at oracle.com> wrote:

>  Very interesting post.  Like someone mentioned in the comments, with
> -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled, CMS can clean
> classes in PermGen with minor GC.  But G1 can only unload class during full
> gc.  Full GC in G1 is slow as it is single threaded.
>


One small correction: CMS collects perm gen in major gc cycles, albeit
concurrently with that flag enabled. The perm gen isn't cleaned at a minor
gc with any of our collectors, since
global reachability isn't checked at minor gc's.

-- ramki


>
>
>  Thanks,
> Jenny
>
> On 1/3/2014 7:47 AM, Jose Otavio Carlomagno Filho wrote:
>
> We recently switched to G1 in our application and started experiencing
> this type of behaviour too. Turns out G1 was not causing the problem, it
> was only exposing it to us.
>
>  Our application would generate a large number of proxy classes and that
> would cause the Perm Gen to fill up until a full GC was performed by G1.
> When using ParallelOldGC, this would not happen because full GCs would be
> executed much more frequently (when the old gen was full), which prevented
> the perm gen from filling up.
>
>  You can find more info about our problem and our analysis here:
> http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe
>
>  I recommend you use a profiling too to investigate the root cause of
> your Perm Gen getting filled up. There's a chance it is a leak, but as I
> said, in our case, it was our own application's fault and G1 exposed the
> problem to us.
>
>  Regards,
> Jose
>
>
> On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot <wolfgang.pedot at finkzeit.at
> > wrote:
>
>> Hi,
>>
>> I am using G1 on 7u45 for an application-server which has a "healthy"
>> permGen churn because it generates a lot of short-lived dynamic classes
>> (JavaScript). Currently permGen is sized at a little over 1GB and
>> depending on usage there can be up to 2 full GCs per day (usually only
>> 1). I have not noticed an increased permGen usage with G1 (increased
>> size just before switching to G1) but I have noticed something odd about
>> the permGen-usage after a collect. The class-count will always fall back
>> to the same level which is currently 65k but the permGen usage after
>> collect can either be ~0.8GB or ~0.55GB. There are always 3 collects
>> resulting in 0.8GB followed by one scoring 0.55GB so there seems to be
>> some kind of "rythm" going on. The full GCs are always triggered by
>> permGen getting full and the loaded class count goes significantly
>> higher after a 0.55GB collect (165k vs 125k) so I guess some classes
>> just get unloaded later...
>>
>> I can not tell if this behaviour is due to G1 or some other factor in
>> this application but I do know that I have no leak because the
>> after-collect values are fairly stable over weeks.
>>
>> So I have not experienced this but am sharing anyway ;)
>>
>> happy new year
>> Wolfgang
>>
>> Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna:
>>  > I haven't narrowed it down sufficiently yet, but has anyone noticed if
>> > G1 causes a higher perm gen footprint or, worse, a perm gen leak
>> perhaps?
>> > I do realize that G1 does not today (as of 7u40 at least) collect the
>> > perm gen concurrently, rather deferring its collection to a stop-world
>> full
>> > gc. However, it has just come to my attention that despite full
>> > stop-world gc's (on account of the perm gen getting full), G1 still uses
>> > more perm gen
>> > space (in some instacnes substantially more) than ParallelOldGC even
>> > after the full stop-world gc's, in some of our experiments. (PS: Also
>> > noticed
>> > that the default gc logging for G1 does not print the perm gen usage at
>> > full gc, unlike other collectors; looks like an oversight in logging
>> > perhaps one
>> > that has been fixed recently; i was on 7u40 i think.)
>> >
>> > While I need to collect more data using non-ParallelOld, non-G1
>> > collectors (escpeially CMS) to see how things look and to get closer to
>> > the root
>> > cause, I wondered if anyone else had come across a similar issue and to
>> > check if this is a known issue.
>> >
>> > I'll post more details after gathering more data, but in case anyone has
>> > experienced this, please do share.
>> >
>> > thank you in advance, and Happy New Year!
>> > -- ramki
>> >
>> >
>>  > _______________________________________________
>> > hotspot-gc-use mailing list
>> > hotspot-gc-use at openjdk.java.net
>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/7a4c9945/attachment.html 

From yu.zhang at oracle.com  Fri Jan  3 11:53:34 2014
From: yu.zhang at oracle.com (YU ZHANG)
Date: Fri, 03 Jan 2014 11:53:34 -0800
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <CABzyjykfU1PWQBu6sLXuz4ycSLqZAk9URT4ePKgMq897Ai8xgw@mail.gmail.com>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>
	<52C6D83A.8070309@finkzeit.at>
	<CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>
	<52C6FBE6.6040904@oracle.com>
	<CABzyjykfU1PWQBu6sLXuz4ycSLqZAk9URT4ePKgMq897Ai8xgw@mail.gmail.com>
Message-ID: <52C7153E.9070206@oracle.com>

Ramki,

The perm gen data would be very interesting.

And thanks for correcting me on my previous post:

"One small correction: CMS collects perm gen in major gc cycles, albeit 
concurrently with that flag enabled. The perm gen isn't cleaned at a 
minor gc with any of our collectors, since
global reachability isn't checked at minor gc's."

Thanks,
Jenny

On 1/3/2014 11:30 AM, Srinivas Ramakrishna wrote:
> Thanks everyone for sharing yr experiences. As I indicated, I do 
> realize that G1 does not collect perm gen concurrently.
> What was surprising was that G1's use of perm gen was much higher 
> following its stop-world full gc's
> which would have collected the perm gen. As a result, G1 needed a perm 
> gen quite a bit more than twice that
> given to parallel gc to be able to run an application for a certain 
> length of time.
>
> I'll provide more data on perm gen dynamics when I have it. My guess 
> would be that somehow G1's use of
> regions in the perm gen is causing a dilation of perm gen footprint on 
> account of fragmentation in the G1 perm
> gen regions. If that were the case, I would expect a modest increase 
> in the perm gen footprint, but it seemed the increase in
> footprint was much higher. I'll collect and post more concrete numbers 
> when I get a chance.
>
> -- ramki
>
>
>
> On Fri, Jan 3, 2014 at 10:05 AM, YU ZHANG <yu.zhang at oracle.com 
> <mailto:yu.zhang at oracle.com>> wrote:
>
>     Very interesting post.  Like someone mentioned in the comments,
>     with -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled, CMS
>     can clean classes in PermGen with minor GC.  But G1 can only
>     unload class during full gc.  Full GC in G1 is slow as it is
>     single threaded.
>
>     Thanks,
>     Jenny
>
>     On 1/3/2014 7:47 AM, Jose Otavio Carlomagno Filho wrote:
>>     We recently switched to G1 in our application and started
>>     experiencing this type of behaviour too. Turns out G1 was not
>>     causing the problem, it was only exposing it to us.
>>
>>     Our application would generate a large number of proxy classes
>>     and that would cause the Perm Gen to fill up until a full GC was
>>     performed by G1. When using ParallelOldGC, this would not happen
>>     because full GCs would be executed much more frequently (when the
>>     old gen was full), which prevented the perm gen from filling up.
>>
>>     You can find more info about our problem and our analysis here:
>>     http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe
>>
>>     I recommend you use a profiling too to investigate the root cause
>>     of your Perm Gen getting filled up. There's a chance it is a
>>     leak, but as I said, in our case, it was our own application's
>>     fault and G1 exposed the problem to us.
>>
>>     Regards,
>>     Jose
>>
>>
>>     On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot
>>     <wolfgang.pedot at finkzeit.at <mailto:wolfgang.pedot at finkzeit.at>>
>>     wrote:
>>
>>         Hi,
>>
>>         I am using G1 on 7u45 for an application-server which has a
>>         "healthy"
>>         permGen churn because it generates a lot of short-lived
>>         dynamic classes
>>         (JavaScript). Currently permGen is sized at a little over 1GB and
>>         depending on usage there can be up to 2 full GCs per day
>>         (usually only
>>         1). I have not noticed an increased permGen usage with G1
>>         (increased
>>         size just before switching to G1) but I have noticed
>>         something odd about
>>         the permGen-usage after a collect. The class-count will
>>         always fall back
>>         to the same level which is currently 65k but the permGen
>>         usage after
>>         collect can either be ~0.8GB or ~0.55GB. There are always 3
>>         collects
>>         resulting in 0.8GB followed by one scoring 0.55GB so there
>>         seems to be
>>         some kind of "rythm" going on. The full GCs are always
>>         triggered by
>>         permGen getting full and the loaded class count goes
>>         significantly
>>         higher after a 0.55GB collect (165k vs 125k) so I guess some
>>         classes
>>         just get unloaded later...
>>
>>         I can not tell if this behaviour is due to G1 or some other
>>         factor in
>>         this application but I do know that I have no leak because the
>>         after-collect values are fairly stable over weeks.
>>
>>         So I have not experienced this but am sharing anyway ;)
>>
>>         happy new year
>>         Wolfgang
>>
>>         Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna:
>>         > I haven't narrowed it down sufficiently yet, but has anyone
>>         noticed if
>>         > G1 causes a higher perm gen footprint or, worse, a perm gen
>>         leak perhaps?
>>         > I do realize that G1 does not today (as of 7u40 at least)
>>         collect the
>>         > perm gen concurrently, rather deferring its collection to a
>>         stop-world full
>>         > gc. However, it has just come to my attention that despite full
>>         > stop-world gc's (on account of the perm gen getting full),
>>         G1 still uses
>>         > more perm gen
>>         > space (in some instacnes substantially more) than
>>         ParallelOldGC even
>>         > after the full stop-world gc's, in some of our experiments.
>>         (PS: Also
>>         > noticed
>>         > that the default gc logging for G1 does not print the perm
>>         gen usage at
>>         > full gc, unlike other collectors; looks like an oversight
>>         in logging
>>         > perhaps one
>>         > that has been fixed recently; i was on 7u40 i think.)
>>         >
>>         > While I need to collect more data using non-ParallelOld, non-G1
>>         > collectors (escpeially CMS) to see how things look and to
>>         get closer to
>>         > the root
>>         > cause, I wondered if anyone else had come across a similar
>>         issue and to
>>         > check if this is a known issue.
>>         >
>>         > I'll post more details after gathering more data, but in
>>         case anyone has
>>         > experienced this, please do share.
>>         >
>>         > thank you in advance, and Happy New Year!
>>         > -- ramki
>>         >
>>         >
>>         > _______________________________________________
>>         > hotspot-gc-use mailing list
>>         > hotspot-gc-use at openjdk.java.net
>>         <mailto:hotspot-gc-use at openjdk.java.net>
>>         > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>         >
>>
>>         _______________________________________________
>>         hotspot-gc-use mailing list
>>         hotspot-gc-use at openjdk.java.net
>>         <mailto:hotspot-gc-use at openjdk.java.net>
>>         http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>
>>
>>     _______________________________________________
>>     hotspot-gc-use mailing list
>>     hotspot-gc-use at openjdk.java.net  <mailto:hotspot-gc-use at openjdk.java.net>
>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/dffcd240/attachment.html 

From wolfgang.pedot at finkzeit.at  Fri Jan  3 12:46:43 2014
From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot)
Date: Fri, 03 Jan 2014 21:46:43 +0100
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <52C6FBE6.6040904@oracle.com>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>
	<52C6D83A.8070309@finkzeit.at>
	<CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>
	<52C6FBE6.6040904@oracle.com>
Message-ID: <52C721B3.9010909@finkzeit.at>

Looks like the mail you quoted (from Jose Otavio Carlomagno Filho) was 
in response to mine but I have not received it...

Just to clarify:
I know why permGen fills up and its an expected behaviour in this 
application. Having 1-2 full GCs a day is certainly not ideal but its 
also no killer and I like how G1 handles the young/old heap. What makes 
me wonder is why after every 4th full GC permGen usage drops a good 
250MB lower than the 3 collects before and there is space for 
significantly more classes afterwards (165k vs 125k). Something else in 
permGen must get cleaned up at that time...
That rythm keeps constant so far no matter how much time passes between 
full GCs.

I dont really think G1 causes this 3-1 rythm specifically but whats 
interesting is that CMS with ClassUnloading never got significantly 
below that 0.8GB if I remember correctly.

regards
Wolfgang

PS: my older question about G1 and incremental permGen possibility to 
this mailing list is actually linked in that stackoverflow-thread so we 
have a complete circle here ;)


Am 03.01.2014 19:05, schrieb YU ZHANG:
> Very interesting post.  Like someone mentioned in the comments, with
> -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled, CMS can clean
> classes in PermGen with minor GC.  But G1 can only unload class during
> full gc.  Full GC in G1 is slow as it is single threaded.
>
> Thanks,
> Jenny
>
> On 1/3/2014 7:47 AM, Jose Otavio Carlomagno Filho wrote:
>> We recently switched to G1 in our application and started experiencing
>> this type of behaviour too. Turns out G1 was not causing the problem,
>> it was only exposing it to us.
>>
>> Our application would generate a large number of proxy classes and
>> that would cause the Perm Gen to fill up until a full GC was performed
>> by G1. When using ParallelOldGC, this would not happen because full
>> GCs would be executed much more frequently (when the old gen was
>> full), which prevented the perm gen from filling up.
>>
>> You can find more info about our problem and our analysis here:
>> http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe
>>
>> I recommend you use a profiling too to investigate the root cause of
>> your Perm Gen getting filled up. There's a chance it is a leak, but as
>> I said, in our case, it was our own application's fault and G1 exposed
>> the problem to us.
>>
>> Regards,
>> Jose
>>
>>
>> On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot
>> <wolfgang.pedot at finkzeit.at <mailto:wolfgang.pedot at finkzeit.at>> wrote:
>>
>>     Hi,
>>
>>     I am using G1 on 7u45 for an application-server which has a "healthy"
>>     permGen churn because it generates a lot of short-lived dynamic
>>     classes
>>     (JavaScript). Currently permGen is sized at a little over 1GB and
>>     depending on usage there can be up to 2 full GCs per day (usually only
>>     1). I have not noticed an increased permGen usage with G1 (increased
>>     size just before switching to G1) but I have noticed something odd
>>     about
>>     the permGen-usage after a collect. The class-count will always
>>     fall back
>>     to the same level which is currently 65k but the permGen usage after
>>     collect can either be ~0.8GB or ~0.55GB. There are always 3 collects
>>     resulting in 0.8GB followed by one scoring 0.55GB so there seems to be
>>     some kind of "rythm" going on. The full GCs are always triggered by
>>     permGen getting full and the loaded class count goes significantly
>>     higher after a 0.55GB collect (165k vs 125k) so I guess some classes
>>     just get unloaded later...
>>
>>     I can not tell if this behaviour is due to G1 or some other factor in
>>     this application but I do know that I have no leak because the
>>     after-collect values are fairly stable over weeks.
>>
>>     So I have not experienced this but am sharing anyway ;)
>>
>>     happy new year
>>     Wolfgang
>>
>>     Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna:
>>     > I haven't narrowed it down sufficiently yet, but has anyone
>>     noticed if
>>     > G1 causes a higher perm gen footprint or, worse, a perm gen leak
>>     perhaps?
>>     > I do realize that G1 does not today (as of 7u40 at least)
>>     collect the
>>     > perm gen concurrently, rather deferring its collection to a
>>     stop-world full
>>     > gc. However, it has just come to my attention that despite full
>>     > stop-world gc's (on account of the perm gen getting full), G1
>>     still uses
>>     > more perm gen
>>     > space (in some instacnes substantially more) than ParallelOldGC even
>>     > after the full stop-world gc's, in some of our experiments. (PS:
>>     Also
>>     > noticed
>>     > that the default gc logging for G1 does not print the perm gen
>>     usage at
>>     > full gc, unlike other collectors; looks like an oversight in logging
>>     > perhaps one
>>     > that has been fixed recently; i was on 7u40 i think.)
>>     >
>>     > While I need to collect more data using non-ParallelOld, non-G1
>>     > collectors (escpeially CMS) to see how things look and to get
>>     closer to
>>     > the root
>>     > cause, I wondered if anyone else had come across a similar issue
>>     and to
>>     > check if this is a known issue.
>>     >
>>     > I'll post more details after gathering more data, but in case
>>     anyone has
>>     > experienced this, please do share.
>>     >
>>     > thank you in advance, and Happy New Year!
>>     > -- ramki
>>     >
>>     >
>>     > _______________________________________________
>>     > hotspot-gc-use mailing list
>>     > hotspot-gc-use at openjdk.java.net
>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>     > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>     >
>>
>>     _______________________________________________
>>     hotspot-gc-use mailing list
>>     hotspot-gc-use at openjdk.java.net
>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>


From bernd-2013 at eckenfels.net  Fri Jan  3 12:59:10 2014
From: bernd-2013 at eckenfels.net (Bernd Eckenfels)
Date: Fri, 03 Jan 2014 21:59:10 +0100
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <52C721B3.9010909@finkzeit.at>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>
	<52C6D83A.8070309@finkzeit.at>
	<CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>
	<52C6FBE6.6040904@oracle.com> <52C721B3.9010909@finkzeit.at>
Message-ID: <op.w84primatc8ri4@eckenfels02.seeburger.de>

Am 03.01.2014, 21:46 Uhr, schrieb Wolfgang Pedot  
<wolfgang.pedot at finkzeit.at>:
> What makes
> me wonder is why after every 4th full GC permGen usage drops a good
> 250MB lower than the 3 collects before and there is space for
> significantly more classes afterwards (165k vs 125k).

Could be softreference or (more likely) finalizer related?

Gruss
Bernd

From thomas.schatzl at oracle.com  Fri Jan  3 13:52:46 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 03 Jan 2014 22:52:46 +0100
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <CABzyjykfU1PWQBu6sLXuz4ycSLqZAk9URT4ePKgMq897Ai8xgw@mail.gmail.com>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>
	<52C6D83A.8070309@finkzeit.at>
	<CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>
	<52C6FBE6.6040904@oracle.com>
	<CABzyjykfU1PWQBu6sLXuz4ycSLqZAk9URT4ePKgMq897Ai8xgw@mail.gmail.com>
Message-ID: <1388785966.6059.2.camel@cirrus>

Hi,

On Fri, 2014-01-03 at 11:30 -0800, Srinivas Ramakrishna wrote:
> Thanks everyone for sharing yr experiences. As I indicated, I do
> realize that G1 does not collect perm gen concurrently.
> What was surprising was that G1's use of perm gen was much higher
> following its stop-world full gc's
> which would have collected the perm gen. As a result, G1 needed a perm
> gen quite a bit more than twice that
> given to parallel gc to be able to run an application for a certain
> length of time.

Maybe explained by different soft reference policies? I.e. maybe the
input for the soft reference processing is different in both collectors,
making it behave differently, possibly keeping alive more
objects/classes for longer.

> I'll provide more data on perm gen dynamics when I have it. My guess
> would be that somehow G1's use of
> regions in the perm gen is causing a dilation of perm gen footprint on
> account of fragmentation in the G1 perm
> gen regions. If that were the case, I would expect a modest increase
> in the perm gen footprint, but it seemed the increase in
> footprint was much higher. I'll collect and post more concrete numbers
> when I get a chance.

(G1) Perm gen is never region based.

Thomas


From ysr1729 at gmail.com  Fri Jan  3 14:02:23 2014
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 3 Jan 2014 14:02:23 -0800
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <1388785966.6059.2.camel@cirrus>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>
	<52C6D83A.8070309@finkzeit.at>
	<CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>
	<52C6FBE6.6040904@oracle.com>
	<CABzyjykfU1PWQBu6sLXuz4ycSLqZAk9URT4ePKgMq897Ai8xgw@mail.gmail.com>
	<1388785966.6059.2.camel@cirrus>
Message-ID: <CABzyjyk4Vs=xJvpjKnzN1+a+9+6dKuyBnmA=W0yod3q=UyLF6Q@mail.gmail.com>

Hi Thomas --


On Fri, Jan 3, 2014 at 1:52 PM, Thomas Schatzl <thomas.schatzl at oracle.com>wrote:

> Hi,
>
> On Fri, 2014-01-03 at 11:30 -0800, Srinivas Ramakrishna wrote:
> > Thanks everyone for sharing yr experiences. As I indicated, I do
> > realize that G1 does not collect perm gen concurrently.
> > What was surprising was that G1's use of perm gen was much higher
> > following its stop-world full gc's
> > which would have collected the perm gen. As a result, G1 needed a perm
> > gen quite a bit more than twice that
> > given to parallel gc to be able to run an application for a certain
> > length of time.
>
> Maybe explained by different soft reference policies? I.e. maybe the
> input for the soft reference processing is different in both collectors,
> making it behave differently, possibly keeping alive more
> objects/classes for longer.
>

Thanks for that thought; i'll keep that in mind.


>
> > I'll provide more data on perm gen dynamics when I have it. My guess
> > would be that somehow G1's use of
> > regions in the perm gen is causing a dilation of perm gen footprint on
> > account of fragmentation in the G1 perm
> > gen regions. If that were the case, I would expect a modest increase
> > in the perm gen footprint, but it seemed the increase in
> > footprint was much higher. I'll collect and post more concrete numbers
> > when I get a chance.
>
> (G1) Perm gen is never region based.
>
>
Ah, thanks for correcting that misconception of mine. So we can cross that
off.

-- ramki


> Thomas
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/996d4c42/attachment.html 

From yaoshengzhe at gmail.com  Mon Jan  6 12:03:36 2014
From: yaoshengzhe at gmail.com (yao)
Date: Mon, 6 Jan 2014 12:03:36 -0800
Subject: java process memory usage is higher than Xmx
Message-ID: <CA+FETEJv+MWOHP7ZykvHckVm1ZmSjH_mcXZjFY6PyWMW+ZnZpg@mail.gmail.com>

Hi All,

I have a java process (HBase region server process ) running under Java 7
(1.7.0_40-b43) with G1 enabled. Both Xms and Xmx are the same. After
running process for a few hours, I see the actual memory used by the
process is about 10 percent higher than given Xmx. Has anyone experienced
the similar when use Java 7 or G1 ? Is there useful tools to diagnose the
cause ?

I've tried jmap but the output doesn't say anything about high memory
usage. FYI, the java process use a large heap (90GB), but the actual memory
usage ($ top) is about 99GB.

Thanks
Shengzhe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140106/44ab95e1/attachment.html 

From jon.masamitsu at oracle.com  Mon Jan  6 12:01:41 2014
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Mon, 06 Jan 2014 12:01:41 -0800
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <52C721B3.9010909@finkzeit.at>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>	<52C6D83A.8070309@finkzeit.at>	<CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>	<52C6FBE6.6040904@oracle.com>
	<52C721B3.9010909@finkzeit.at>
Message-ID: <52CB0BA5.2080202@oracle.com>


On 01/03/2014 12:46 PM, Wolfgang Pedot wrote:
> Looks like the mail you quoted (from Jose Otavio Carlomagno Filho) was
> in response to mine but I have not received it...
>
> Just to clarify:
> I know why permGen fills up and its an expected behaviour in this
> application. Having 1-2 full GCs a day is certainly not ideal but its
> also no killer and I like how G1 handles the young/old heap. What makes
> me wonder is why after every 4th full GC permGen usage drops a good
> 250MB lower than the 3 collects before and there is space for
> significantly more classes afterwards (165k vs 125k). Something else in
> permGen must get cleaned up at that time...
> That rythm keeps constant so far no matter how much time passes between
> full GCs.
>
> I dont really think G1 causes this 3-1 rythm specifically but whats
> interesting is that CMS with ClassUnloading never got significantly
> below that 0.8GB if I remember correctly.
Try

-XX:MarkSweepAlwaysCompactCount=1

which should make every full GC compact out all
the dead space.

Alternatively try

-XX:MarkSweepAlwaysCompactCount=8

and see if that changes the pattern.

   product(uintx, MarkSweepAlwaysCompactCount, 4,                        \
       "How often should we fully compact the heap (ignoring the dead "  \
       "space parameters)")


Jon

>
> regards
> Wolfgang
>
> PS: my older question about G1 and incremental permGen possibility to
> this mailing list is actually linked in that stackoverflow-thread so we
> have a complete circle here ;)
>
>
>
> Am 03.01.2014 19:05, schrieb YU ZHANG:
>> Very interesting post.  Like someone mentioned in the comments, with
>> -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled, CMS can clean
>> classes in PermGen with minor GC.  But G1 can only unload class during
>> full gc.  Full GC in G1 is slow as it is single threaded.
>>
>> Thanks,
>> Jenny
>>
>> On 1/3/2014 7:47 AM, Jose Otavio Carlomagno Filho wrote:
>>> We recently switched to G1 in our application and started experiencing
>>> this type of behaviour too. Turns out G1 was not causing the problem,
>>> it was only exposing it to us.
>>>
>>> Our application would generate a large number of proxy classes and
>>> that would cause the Perm Gen to fill up until a full GC was performed
>>> by G1. When using ParallelOldGC, this would not happen because full
>>> GCs would be executed much more frequently (when the old gen was
>>> full), which prevented the perm gen from filling up.
>>>
>>> You can find more info about our problem and our analysis here:
>>> http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe
>>>
>>> I recommend you use a profiling too to investigate the root cause of
>>> your Perm Gen getting filled up. There's a chance it is a leak, but as
>>> I said, in our case, it was our own application's fault and G1 exposed
>>> the problem to us.
>>>
>>> Regards,
>>> Jose
>>>
>>>
>>> On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot
>>> <wolfgang.pedot at finkzeit.at <mailto:wolfgang.pedot at finkzeit.at>> wrote:
>>>
>>>      Hi,
>>>
>>>      I am using G1 on 7u45 for an application-server which has a "healthy"
>>>      permGen churn because it generates a lot of short-lived dynamic
>>>      classes
>>>      (JavaScript). Currently permGen is sized at a little over 1GB and
>>>      depending on usage there can be up to 2 full GCs per day (usually only
>>>      1). I have not noticed an increased permGen usage with G1 (increased
>>>      size just before switching to G1) but I have noticed something odd
>>>      about
>>>      the permGen-usage after a collect. The class-count will always
>>>      fall back
>>>      to the same level which is currently 65k but the permGen usage after
>>>      collect can either be ~0.8GB or ~0.55GB. There are always 3 collects
>>>      resulting in 0.8GB followed by one scoring 0.55GB so there seems to be
>>>      some kind of "rythm" going on. The full GCs are always triggered by
>>>      permGen getting full and the loaded class count goes significantly
>>>      higher after a 0.55GB collect (165k vs 125k) so I guess some classes
>>>      just get unloaded later...
>>>
>>>      I can not tell if this behaviour is due to G1 or some other factor in
>>>      this application but I do know that I have no leak because the
>>>      after-collect values are fairly stable over weeks.
>>>
>>>      So I have not experienced this but am sharing anyway ;)
>>>
>>>      happy new year
>>>      Wolfgang
>>>
>>>      Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna:
>>>      > I haven't narrowed it down sufficiently yet, but has anyone
>>>      noticed if
>>>      > G1 causes a higher perm gen footprint or, worse, a perm gen leak
>>>      perhaps?
>>>      > I do realize that G1 does not today (as of 7u40 at least)
>>>      collect the
>>>      > perm gen concurrently, rather deferring its collection to a
>>>      stop-world full
>>>      > gc. However, it has just come to my attention that despite full
>>>      > stop-world gc's (on account of the perm gen getting full), G1
>>>      still uses
>>>      > more perm gen
>>>      > space (in some instacnes substantially more) than ParallelOldGC even
>>>      > after the full stop-world gc's, in some of our experiments. (PS:
>>>      Also
>>>      > noticed
>>>      > that the default gc logging for G1 does not print the perm gen
>>>      usage at
>>>      > full gc, unlike other collectors; looks like an oversight in logging
>>>      > perhaps one
>>>      > that has been fixed recently; i was on 7u40 i think.)
>>>      >
>>>      > While I need to collect more data using non-ParallelOld, non-G1
>>>      > collectors (escpeially CMS) to see how things look and to get
>>>      closer to
>>>      > the root
>>>      > cause, I wondered if anyone else had come across a similar issue
>>>      and to
>>>      > check if this is a known issue.
>>>      >
>>>      > I'll post more details after gathering more data, but in case
>>>      anyone has
>>>      > experienced this, please do share.
>>>      >
>>>      > thank you in advance, and Happy New Year!
>>>      > -- ramki
>>>      >
>>>      >
>>>      > _______________________________________________
>>>      > hotspot-gc-use mailing list
>>>      > hotspot-gc-use at openjdk.java.net
>>>      <mailto:hotspot-gc-use at openjdk.java.net>
>>>      > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>      >
>>>
>>>      _______________________________________________
>>>      hotspot-gc-use mailing list
>>>      hotspot-gc-use at openjdk.java.net
>>>      <mailto:hotspot-gc-use at openjdk.java.net>
>>>      http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From daubman at gmail.com  Mon Jan  6 12:54:29 2014
From: daubman at gmail.com (Aaron Daubman)
Date: Mon, 6 Jan 2014 15:54:29 -0500
Subject: java process memory usage is higher than Xmx
In-Reply-To: <CA+FETEJv+MWOHP7ZykvHckVm1ZmSjH_mcXZjFY6PyWMW+ZnZpg@mail.gmail.com>
References: <CA+FETEJv+MWOHP7ZykvHckVm1ZmSjH_mcXZjFY6PyWMW+ZnZpg@mail.gmail.com>
Message-ID: <CALyTvnpi77F=F+5fi4oWnR_VMEDCN+a75ZqnGwXbSnVWHF7AJg@mail.gmail.com>

>
>
> I've tried jmap but the output doesn't say anything about high memory
> usage. FYI, the java process use a large heap (90GB), but the actual memory
> usage ($ top) is about 99GB.
>
>
Is this the RES column from top?

You might also see:
http://plumbr.eu/blog/why-does-my-java-process-consume-more-memory-than-xmx

Although I would not expect permgen and stack to sum up to 9G...
Are you using JNI, bytebuffers or anything else that would allocate
off-heap memory?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140106/77be33ea/attachment.html 

From yaoshengzhe at gmail.com  Mon Jan  6 13:33:45 2014
From: yaoshengzhe at gmail.com (yao)
Date: Mon, 6 Jan 2014 13:33:45 -0800
Subject: java process memory usage is higher than Xmx
In-Reply-To: <CALyTvnpi77F=F+5fi4oWnR_VMEDCN+a75ZqnGwXbSnVWHF7AJg@mail.gmail.com>
References: <CA+FETEJv+MWOHP7ZykvHckVm1ZmSjH_mcXZjFY6PyWMW+ZnZpg@mail.gmail.com>
	<CALyTvnpi77F=F+5fi4oWnR_VMEDCN+a75ZqnGwXbSnVWHF7AJg@mail.gmail.com>
Message-ID: <CA+FETEJF+5XN1370YKGbn7R2BT1JvSeqZ=GFO2VJBevoAYR3yA@mail.gmail.com>

Hi Aaron,


Is this the RES column from top?
>

Yes, it is.

Are you using JNI, bytebuffers or anything else that would allocate
> off-heap memory?
>

It is original HBase region server process, we never modify the code. It
might use off-heap memory internally but the problem is, similar machine
running under Java 6 with CMS do not have this problem and the real memory
usage is very close to Xmx.

In our case, the permgen seems not very high and I don't think stack would
be a problem. I am now wondering, does anyone use G1 (with large heap)
experience this memory usage issue (larger than Xmx a lot) ?

Perm Generation:
   capacity = 33554432 (32.0MB)
   used     = 31725872 (30.256149291992188MB)
   free     = 1828560 (1.7438507080078125MB)
   94.55046653747559% used

-Shengzhe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140106/292eb2a6/attachment.html 

From thomas.schatzl at oracle.com  Mon Jan  6 13:43:12 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 06 Jan 2014 22:43:12 +0100
Subject: java process memory usage is higher than Xmx
In-Reply-To: <CA+FETEJv+MWOHP7ZykvHckVm1ZmSjH_mcXZjFY6PyWMW+ZnZpg@mail.gmail.com>
References: <CA+FETEJv+MWOHP7ZykvHckVm1ZmSjH_mcXZjFY6PyWMW+ZnZpg@mail.gmail.com>
Message-ID: <1389044592.5005.3.camel@cirrus>

Hi,

On Mon, 2014-01-06 at 12:03 -0800, yao wrote:
> Hi All,
> 
> I have a java process (HBase region server process ) running under
> Java 7 (1.7.0_40-b43) with G1 enabled. Both Xms and Xmx are the same.
> After running process for a few hours, I see the actual memory used by
> the process is about 10 percent higher than given Xmx. Has anyone
> experienced the similar when use Java 7 or G1 ? Is there useful tools
> to diagnose the cause ?
> 
> 
> I've tried jmap but the output doesn't say anything about high memory
> usage. FYI, the java process use a large heap (90GB), but the actual
> memory usage ($ top) is about 99GB.
> 
Possibly remembered set size.

Can you enable -XX:+UnlockDiagnosticVMOptions and -XX:
+G1SummarizeRSetStats?

The line

  Total heap region rem set sizes = 5256086K.  Max = 8640K.

gives you a good idea about remembered set size memory usage.

I copied above line from one of your responses to the "G1 GC clean up
time is too long" thread, and it seems the remembered set takes ~5GB
there.

Hth,
Thomas


From yaoshengzhe at gmail.com  Mon Jan  6 13:49:43 2014
From: yaoshengzhe at gmail.com (yao)
Date: Mon, 6 Jan 2014 13:49:43 -0800
Subject: java process memory usage is higher than Xmx
In-Reply-To: <1389044592.5005.3.camel@cirrus>
References: <CA+FETEJv+MWOHP7ZykvHckVm1ZmSjH_mcXZjFY6PyWMW+ZnZpg@mail.gmail.com>
	<1389044592.5005.3.camel@cirrus>
Message-ID: <CA+FETEL1yQ7xZix9J3zEnUA2--+VO2Wc4iZwgYwB72a0UDY+HA@mail.gmail.com>

Hi Thomas,

Possibly remembered set size.
>
> Can you enable -XX:+UnlockDiagnosticVMOptions and -XX:
> +G1SummarizeRSetStats?
>
> The line
>
>   Total heap region rem set sizes = 5256086K.  Max = 8640K.
>
> gives you a good idea about remembered set size memory usage.
>

You are right, rem set occupies ~7GB

 Concurrent RS processed -1863202596 cards
  Of 12780432 completed buffers:
     12611979 ( 98.7%) by conc RS threads.
       168453 (  1.3%) by mutator threads.
  Conc RS threads times(s)
          0.00     0.00     0.00     0.00     0.00     0.00     0.00
0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
0.00     0.00     0.00    \
 0.00
  Total heap region rem set sizes = 7520648K.  Max = 13256K.
  Static structures = 347K, free_lists = 28814K.
    141012349 occupied cards represented.
    Max size region =
93:(O)[0x00007fcb48000000,0x00007fcb4a000000,0x00007fcb4a000000], size =
13257K, occupied = 2639K.
    Did 0 coarsenings.

I copied above line from one of your responses to the "G1 GC clean up
> time is too long" thread, and it seems the remembered set takes ~5GB
> there.
>

I did set -XX:G1RSetRegionEntries=4096 to avoid coarsenings; however, the
cleanup time seems not being reduced, it is still around 700 milliseconds,
althrough there is no coarsenings.

Any hint for tuning ? Because I want process with G1 use the same heap as
CMS to compare the performance. But I cannot do so if rem set is that
large, the process will be likely killed by OOM killed if I gave more
memory.

-Shengzhe


On Mon, Jan 6, 2014 at 1:43 PM, Thomas Schatzl <thomas.schatzl at oracle.com>wrote:

> Hi,
>
> On Mon, 2014-01-06 at 12:03 -0800, yao wrote:
> > Hi All,
> >
> > I have a java process (HBase region server process ) running under
> > Java 7 (1.7.0_40-b43) with G1 enabled. Both Xms and Xmx are the same.
> > After running process for a few hours, I see the actual memory used by
> > the process is about 10 percent higher than given Xmx. Has anyone
> > experienced the similar when use Java 7 or G1 ? Is there useful tools
> > to diagnose the cause ?
> >
> >
> > I've tried jmap but the output doesn't say anything about high memory
> > usage. FYI, the java process use a large heap (90GB), but the actual
> > memory usage ($ top) is about 99GB.
> >
> Possibly remembered set size.
>
> Can you enable -XX:+UnlockDiagnosticVMOptions and -XX:
> +G1SummarizeRSetStats?
>
> The line
>
>   Total heap region rem set sizes = 5256086K.  Max = 8640K.
>
> gives you a good idea about remembered set size memory usage.
>
> I copied above line from one of your responses to the "G1 GC clean up
> time is too long" thread, and it seems the remembered set takes ~5GB
> there.
>
> Hth,
> Thomas
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140106/9ed3f35b/attachment.html 

From yu.zhang at oracle.com  Mon Jan  6 14:05:38 2014
From: yu.zhang at oracle.com (YU ZHANG)
Date: Mon, 06 Jan 2014 14:05:38 -0800
Subject: java process memory usage is higher than Xmx
In-Reply-To: <1389044592.5005.3.camel@cirrus>
References: <CA+FETEJv+MWOHP7ZykvHckVm1ZmSjH_mcXZjFY6PyWMW+ZnZpg@mail.gmail.com>
	<1389044592.5005.3.camel@cirrus>
Message-ID: <52CB28B2.7010705@oracle.com>

I did a study on G1 vs ParallelgGC native memory footprint.  The source 
for G1 using more memory includes:
mtGC: mainly for RS related data structure.
internal: internal for tracking
Thread: g1 has more internal threads and thread related data structures

In Yao's previous email
"It might use off-heap memory internally but the problem is, similar 
machine running under Java 6 with CMS do not have this problem and the 
real memory usage is very close to Xmx."

I am not quite familiar with CMS, does CMS need to keep a similar RS 
kinda data structure?

Thanks,
Jenny

On 1/6/2014 1:43 PM, Thomas Schatzl wrote:
> Hi,
>
> On Mon, 2014-01-06 at 12:03 -0800, yao wrote:
>> Hi All,
>>
>> I have a java process (HBase region server process ) running under
>> Java 7 (1.7.0_40-b43) with G1 enabled. Both Xms and Xmx are the same.
>> After running process for a few hours, I see the actual memory used by
>> the process is about 10 percent higher than given Xmx. Has anyone
>> experienced the similar when use Java 7 or G1 ? Is there useful tools
>> to diagnose the cause ?
>>
>>
>> I've tried jmap but the output doesn't say anything about high memory
>> usage. FYI, the java process use a large heap (90GB), but the actual
>> memory usage ($ top) is about 99GB.
>>
> Possibly remembered set size.
>
> Can you enable -XX:+UnlockDiagnosticVMOptions and -XX:
> +G1SummarizeRSetStats?
>
> The line
>
>    Total heap region rem set sizes = 5256086K.  Max = 8640K.
>
> gives you a good idea about remembered set size memory usage.
>
> I copied above line from one of your responses to the "G1 GC clean up
> time is too long" thread, and it seems the remembered set takes ~5GB
> there.
>
> Hth,
> Thomas
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From thomas.schatzl at oracle.com  Mon Jan  6 15:19:56 2014
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 07 Jan 2014 00:19:56 +0100
Subject: java process memory usage is higher than Xmx
In-Reply-To: <CA+FETEL1yQ7xZix9J3zEnUA2--+VO2Wc4iZwgYwB72a0UDY+HA@mail.gmail.com>
References: <CA+FETEJv+MWOHP7ZykvHckVm1ZmSjH_mcXZjFY6PyWMW+ZnZpg@mail.gmail.com>
	<1389044592.5005.3.camel@cirrus>
	<CA+FETEL1yQ7xZix9J3zEnUA2--+VO2Wc4iZwgYwB72a0UDY+HA@mail.gmail.com>
Message-ID: <1389050396.5530.31.camel@cirrus>

Hi Shengzhe,

On Mon, 2014-01-06 at 13:49 -0800, yao wrote:
> Hi Thomas,
>         Possibly remembered set size.
>         Can you enable -XX:+UnlockDiagnosticVMOptions and -XX:
>         +G1SummarizeRSetStats?
>         The line
>           Total heap region rem set sizes = 5256086K.  Max = 8640K.
>         gives you a good idea about remembered set size memory usage.
>  
> You are right, rem set occupies ~7GB 

Could you add -XX:+G1SummarizeConcMark? The GC then shows some details
about the work done during cleanup phases at VM exit.

At 7 GB remembered set size most likely the phase that tries to minimize
the remembered set is dominant.

It should show up as large "RS scrub total time" compared to the "Final
counting total time".
> 
>  Concurrent RS processed -1863202596 cards
>   Of 12780432 completed buffers:
>      12611979 ( 98.7%) by conc RS threads.
>        168453 (  1.3%) by mutator threads.
>   Conc RS threads times(s)
>           0.00     0.00     0.00     0.00     0.00     0.00     0.00
> 0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
> 0.00     0.00     0.00    \
>  0.00
>   Total heap region rem set sizes = 7520648K.  Max = 13256K.
>   Static structures = 347K, free_lists = 28814K.
>     141012349 occupied cards represented.
>     Max size region =
> 93:(O)[0x00007fcb48000000,0x00007fcb4a000000,0x00007fcb4a000000], size
> = 13257K, occupied = 2639K.
>     Did 0 coarsenings.
> 
>         I copied above line from one of your responses to the "G1 GC
>         clean up
>         time is too long" thread, and it seems the remembered set
>         takes ~5GB
>         there.
> 
> I did set -XX:G1RSetRegionEntries=4096 to avoid coarsenings; however,
> the cleanup time seems not being reduced, it is still around 700
> milliseconds, althrough there is no coarsenings.
>
> Any hint for tuning ? Because I want process with G1 use the same heap
> as CMS to compare the performance. But I cannot do so if rem set is

You could still compare performance with the same total memory usage.

>  that large, the process will be likely killed by OOM killed if I gave
> more memory.

The default value of G1RSetRegionEntries at 32M region size should be
1536 (= (log(region size) - log(1MB) + 1) * G1RSetRegionEntriesBase by
default); the chosen value means that you allow G1 to keep a larger
remembered set (ie. less coarsening) per region.

The "RS scrub" part of cleanup is roughly dependent on remembered set
size, and this is the main knob to turn here imo.

So it seems that increasing the G1RSetRegionEntries is
counter-productive for decreasing gc cleanup time, because scrubbing
coarsened remembered sets looks fast. I do not have numbers though, just
a feeling.
Coarsening mostly increases gc pause time (RS Scan time to be exact).

Otoh you mentioned that gc cleanup time did not change when changing
G1RSetRegionEntries.

It's best to measure where the time is spent using the
G1SummarizeConcMark switch and then possibly change the
G1RSetRegionEntries value (and measuring the impact).

Thomas


From yaoshengzhe at gmail.com  Mon Jan  6 15:31:55 2014
From: yaoshengzhe at gmail.com (yao)
Date: Mon, 6 Jan 2014 15:31:55 -0800
Subject: G1 Full GC without to-space exhausted
Message-ID: <CA+FETELxphVakuv19WkU8aByoq-UHEfmjTy71c6boi59tz238Q@mail.gmail.com>

Hi All

We have some interesting G1 GC logs and want to share with you. G1 triggers
full GC without to-space exhausted.

G1 parameters we use:
 -server -XX:MaxGCPauseMillis=100 -XX:G1HeapRegionSize=32m
-XX:InitiatingHeapOccupancyPercent=65 -XX:G1ReservePercent=20
-XX:G1HeapWastePercent=5 -XX:G1MixedGCLiveThresholdPercent=75
-XX:G1RSetRegionEntries=4096

Note:
    1. This machine do not have any full GCs after applying
"*-XX:G1ReservePercent=20
-XX:G1HeapWastePercent=5 -XX:G1MixedGCLiveThresholdPercent=75*" since Dec
26, 2013
    2. Yesterday, we set -XX:G1RSetRegionEntries=4096 to reduce RSet
coarsening and we've observed following two full GCs after a few hours (~
20 hours).
    3. Another production machine with similar traffic (without
-XX:G1RSetRegionEntries=4096) do not have full GCs so far (since Dec 26,
2013)


*First Full GC*2014-01-06T17:21:11.644-0500: 72496.707: [GC pause (young)
Desired survivor size 234881024 bytes, new threshold 3 (max 15)
- age   1:   91549360 bytes,   91549360 total
- age   2:   83989936 bytes,  175539296 total
- age   3:   80986496 bytes,  256525792 total
 72496.708: [G1Ergonomics (CSet Construction) start choosing CSet,
_pending_cards: 29358, predicted base time: 43.17 ms, remaining time: 56.83
ms, target pause time: 100.0\
0 ms]
 72496.708: [G1Ergonomics (CSet Construction) add young regions to CSet,
eden: 91 regions, survivors: 14 regions, predicted young region time: 34.53
ms]
 72496.708: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
91 regions, survivors: 14 regions, old: 0 regions, predicted pause time:
77.70 ms, target pause t\
ime: 100.00 ms]
 72496.786: [G1Ergonomics (Concurrent Cycles) request concurrent cycle
initiation, reason: occupancy higher than threshold, occupancy: 56841207808
bytes, allocation reques\
t: 0 bytes, threshold: 46172576125 bytes (65.00 %), source: end of GC]
, 0.0788860 secs]
   [Parallel Time: 56.7 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 72496707.9, Avg: 72496708.1, Max:
72496708.3, Diff: 0.4]
      [Ext Root Scanning (ms): Min: 3.1, Avg: 3.6, Max: 5.2, Diff: 2.0,
Sum: 64.6]
      [Update RS (ms): Min: 8.9, Avg: 10.2, Max: 13.1, Diff: 4.3, Sum:
183.5]
         [Processed Buffers: Min: 5, Avg: 19.4, Max: 26, Diff: 21, Sum: 349]
      [Scan RS (ms): Min: 3.9, Avg: 6.8, Max: 7.2, Diff: 3.3, Sum: 121.7]
      [Object Copy (ms): Min: 35.2, Avg: 35.3, Max: 35.4, Diff: 0.2, Sum:
635.2]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.3]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.9]
      [GC Worker Total (ms): Min: 55.7, Avg: 55.9, Max: 56.1, Diff: 0.4,
Sum: 1006.2]
      [GC Worker End (ms): Min: 72496764.0, Avg: 72496764.0, Max:
72496764.1, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Clear CT: 9.7 ms]
   [Other: 12.5 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 6.5 ms]
      [Ref Enq: 1.3 ms]
      [Free CSet: 0.6 ms]
   [Eden: 2912.0M(2912.0M)->0.0B(2944.0M) Survivors: 448.0M->416.0M Heap:
55.9G(66.2G)->53.3G(66.2G)]
 [Times: user=1.14 sys=0.01, real=0.07 secs]
*2014-01-06T17:21:17.773-0500: 72502.837: [Full GC 55G->44G(66G),
42.9123930 secs]*
   [Eden: 1856.0M(2944.0M)->0.0B(8224.0M) Survivors: 416.0M->0.0B Heap:
55.1G(66.2G)->44.2G(66.2G)]
 [Times: user=89.27 sys=0.32, real=42.91 secs]


*Second Full GC*

2014-01-06T17:22:19.756-0500: 72564.819: [GC pause (young)
Desired survivor size 234881024 bytes, new threshold 1 (max 15)
- age   1:  480804360 bytes,  480804360 total
 72564.819: [G1Ergonomics (CSet Construction) start choosing CSet,
_pending_cards: 20485, predicted base time: 39.00 ms, remaining time: 61.00
ms, target pause time: 100.0\
0 ms]
 72564.819: [G1Ergonomics (CSet Construction) add young regions to CSet,
eden: 91 regions, survivors: 14 regions, predicted young region time: 58.68
ms]
 72564.820: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
91 regions, survivors: 14 regions, old: 0 regions, predicted pause time:
97.68 ms, target pause t\
ime: 100.00 ms]
 72564.921: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
recent GC overhead higher than threshold after GC, recent GC overhead:
35.26 %, threshold: 10.00 %,\
 uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)]
, 0.1015370 secs]
   [Parallel Time: 86.3 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 72564819.8, Avg: 72564820.0, Max:
72564820.1, Diff: 0.3]
      [Ext Root Scanning (ms): Min: 3.1, Avg: 3.7, Max: 5.4, Diff: 2.2,
Sum: 65.8]
      [SATB Filtering (ms): Min: 0.0, Avg: 0.2, Max: 4.3, Diff: 4.3, Sum:
4.3]
      [Update RS (ms): Min: 1.0, Avg: 8.7, Max: 69.5, Diff: 68.5, Sum:
156.8]
         [Processed Buffers: Min: 4, Avg: 18.6, Max: 24, Diff: 20, Sum: 335]
      [Scan RS (ms): Min: 0.0, Avg: 1.4, Max: 1.8, Diff: 1.8, Sum: 25.6]
      [Object Copy (ms): Min: 12.4, Avg: 71.3, Max: 75.2, Diff: 62.9, Sum:
1284.0]
      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 2.2]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.6]
      [GC Worker Total (ms): Min: 85.4, Avg: 85.5, Max: 85.7, Diff: 0.3,
Sum: 1539.3]
      [GC Worker End (ms): Min: 72564905.5, Avg: 72564905.5, Max:
72564905.6, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Clear CT: 5.8 ms]
   [Other: 9.5 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 5.7 ms]
      [Ref Enq: 1.5 ms]
      [Free CSet: 0.4 ms]
   [Eden: 2912.0M(2912.0M)->0.0B(2912.0M) Survivors: 448.0M->448.0M Heap:
48.4G(66.2G)->46.1G(66.2G)]
 [Times: user=1.62 sys=0.04, real=0.10 secs]
*2014-01-06T17:22:21.027-0500: 72566.090: [Full GC 47G->42G(66G),
39.4019900 secs]*
   [Eden: 1344.0M(2912.0M)->0.0B(4640.0M) Survivors: 448.0M->0.0B Heap:
47.4G(66.2G)->42.1G(66.2G)]
 [Times: user=85.26 sys=0.25, real=39.39 secs]


*RSet Summarize*

 Concurrent RS processed -1783517237 cards
  Of 13181224 completed buffers:
     13012752 ( 98.7%) by conc RS threads.
       168472 (  1.3%) by mutator threads.
  Conc RS threads times(s)
          0.00     0.00     0.00     0.00     0.00     0.00     0.00
0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
0.00     0.00     0.00     0.00
  Total heap region rem set sizes = 7388596K.  Max = 11538K.
  Static structures = 347K, free_lists = 27631K.
    123115917 occupied cards represented.
    Max size region =
84:(O)[0x00007fcb36000000,0x00007fcb37ffe258,0x00007fcb38000000], size =
11539K, occupied = 1472K.
    Did 0 coarsenings.

Thanks
Shengzhe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140106/95ee6d50/attachment-0001.html 

From yaoshengzhe at gmail.com  Mon Jan  6 15:49:17 2014
From: yaoshengzhe at gmail.com (yao)
Date: Mon, 6 Jan 2014 15:49:17 -0800
Subject: java process memory usage is higher than Xmx
In-Reply-To: <1389050396.5530.31.camel@cirrus>
References: <CA+FETEJv+MWOHP7ZykvHckVm1ZmSjH_mcXZjFY6PyWMW+ZnZpg@mail.gmail.com>
	<1389044592.5005.3.camel@cirrus>
	<CA+FETEL1yQ7xZix9J3zEnUA2--+VO2Wc4iZwgYwB72a0UDY+HA@mail.gmail.com>
	<1389050396.5530.31.camel@cirrus>
Message-ID: <CA+FETEJYFQjJLu9eLoMZHn4vSbQhHZW5jpwVXhp=HJEkXGm7jQ@mail.gmail.com>

Hi Thomas,

Thanks for your good explanation, very informational and helpful.

Thanks
-Shengzhe


On Mon, Jan 6, 2014 at 3:19 PM, Thomas Schatzl <thomas.schatzl at oracle.com>wrote:

> Hi Shengzhe,
>
> On Mon, 2014-01-06 at 13:49 -0800, yao wrote:
> > Hi Thomas,
> >         Possibly remembered set size.
> >         Can you enable -XX:+UnlockDiagnosticVMOptions and -XX:
> >         +G1SummarizeRSetStats?
> >         The line
> >           Total heap region rem set sizes = 5256086K.  Max = 8640K.
> >         gives you a good idea about remembered set size memory usage.
> >
> > You are right, rem set occupies ~7GB
>
> Could you add -XX:+G1SummarizeConcMark? The GC then shows some details
> about the work done during cleanup phases at VM exit.
>
> At 7 GB remembered set size most likely the phase that tries to minimize
> the remembered set is dominant.
>
> It should show up as large "RS scrub total time" compared to the "Final
> counting total time".
> >
> >  Concurrent RS processed -1863202596 cards
> >   Of 12780432 completed buffers:
> >      12611979 ( 98.7%) by conc RS threads.
> >        168453 (  1.3%) by mutator threads.
> >   Conc RS threads times(s)
> >           0.00     0.00     0.00     0.00     0.00     0.00     0.00
> > 0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
> > 0.00     0.00     0.00    \
> >  0.00
> >   Total heap region rem set sizes = 7520648K.  Max = 13256K.
> >   Static structures = 347K, free_lists = 28814K.
> >     141012349 occupied cards represented.
> >     Max size region =
> > 93:(O)[0x00007fcb48000000,0x00007fcb4a000000,0x00007fcb4a000000], size
> > = 13257K, occupied = 2639K.
> >     Did 0 coarsenings.
> >
> >         I copied above line from one of your responses to the "G1 GC
> >         clean up
> >         time is too long" thread, and it seems the remembered set
> >         takes ~5GB
> >         there.
> >
> > I did set -XX:G1RSetRegionEntries=4096 to avoid coarsenings; however,
> > the cleanup time seems not being reduced, it is still around 700
> > milliseconds, althrough there is no coarsenings.
> >
> > Any hint for tuning ? Because I want process with G1 use the same heap
> > as CMS to compare the performance. But I cannot do so if rem set is
>
> You could still compare performance with the same total memory usage.
>
> >  that large, the process will be likely killed by OOM killed if I gave
> > more memory.
>
> The default value of G1RSetRegionEntries at 32M region size should be
> 1536 (= (log(region size) - log(1MB) + 1) * G1RSetRegionEntriesBase by
> default); the chosen value means that you allow G1 to keep a larger
> remembered set (ie. less coarsening) per region.
>
> The "RS scrub" part of cleanup is roughly dependent on remembered set
> size, and this is the main knob to turn here imo.
>
> So it seems that increasing the G1RSetRegionEntries is
> counter-productive for decreasing gc cleanup time, because scrubbing
> coarsened remembered sets looks fast. I do not have numbers though, just
> a feeling.
> Coarsening mostly increases gc pause time (RS Scan time to be exact).
>
> Otoh you mentioned that gc cleanup time did not change when changing
> G1RSetRegionEntries.
>
> It's best to measure where the time is spent using the
> G1SummarizeConcMark switch and then possibly change the
> G1RSetRegionEntries value (and measuring the impact).
>
> Thomas
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140106/5f6e9078/attachment.html 

From bengt.rutisson at oracle.com  Tue Jan  7 01:21:15 2014
From: bengt.rutisson at oracle.com (Bengt Rutisson)
Date: Tue, 07 Jan 2014 10:21:15 +0100
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <1388785966.6059.2.camel@cirrus>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>	<52C6D83A.8070309@finkzeit.at>	<CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>	<52C6FBE6.6040904@oracle.com>	<CABzyjykfU1PWQBu6sLXuz4ycSLqZAk9URT4ePKgMq897Ai8xgw@mail.gmail.com>
	<1388785966.6059.2.camel@cirrus>
Message-ID: <52CBC70B.2010901@oracle.com>


Hi all,

First just a note about the missing PermGen data in the full GC output. 
That has been fixed for JDK 8 where (there the metadata information is 
printed instead of course) but I don't think it has been backported to 7u.

G1: Output for full GCs with +PrintGCDetails should contain perm 
gen/meta data size change info
https://bugs.openjdk.java.net/browse/JDK-8010738

On 2014-01-03 22:52, Thomas Schatzl wrote:
> Hi,
>
> On Fri, 2014-01-03 at 11:30 -0800, Srinivas Ramakrishna wrote:
>> Thanks everyone for sharing yr experiences. As I indicated, I do
>> realize that G1 does not collect perm gen concurrently.
>> What was surprising was that G1's use of perm gen was much higher
>> following its stop-world full gc's
>> which would have collected the perm gen. As a result, G1 needed a perm
>> gen quite a bit more than twice that
>> given to parallel gc to be able to run an application for a certain
>> length of time.
> Maybe explained by different soft reference policies? I.e. maybe the
> input for the soft reference processing is different in both collectors,
> making it behave differently, possibly keeping alive more
> objects/classes for longer.

Yes, this would be my first guess too. We have seen differences between 
ParallelGC and G1 in the soft reference handling. I think this is mostly 
due to the different way they estimate the used and free space on the 
heap since the actual calculation based on that data is then the same 
for all collectors. On the other hand, as I recall, we saw the opposite 
behavior. That G1 is more aggressive about cleaning soft references than 
ParallelGC.

Maybe playing around a bit with -XX:SoftRefLRUPolicyMSPerMB can help?

Bengt


>
>> I'll provide more data on perm gen dynamics when I have it. My guess
>> would be that somehow G1's use of
>> regions in the perm gen is causing a dilation of perm gen footprint on
>> account of fragmentation in the G1 perm
>> gen regions. If that were the case, I would expect a modest increase
>> in the perm gen footprint, but it seemed the increase in
>> footprint was much higher. I'll collect and post more concrete numbers
>> when I get a chance.
> (G1) Perm gen is never region based.
>
> Thomas
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From Shane.Cox at theice.com  Wed Jan  8 05:05:16 2014
From: Shane.Cox at theice.com (Shane Cox)
Date: Wed, 8 Jan 2014 08:05:16 -0500
Subject: ParNew pauses longer in JDK7
Message-ID: <752D1F18B064FC46BF3DE5166BD3CE9C01CAFC0E@AT-BP-IXMX-09.theice.com>

While benchmarking my application on JDK7, I noticed that minor GC pauses are longer compared to JDK6.  One clue may relate to heap size.  I noticed that heap size (Xmx) has much more impact on minor GC in JDK7.  This is what I have observed:

JDK6 w/ 1GB heap:  avg minor GC pause = 3.9ms
JDK6 w/ 10GB heap:  avg minor GC pause = 3.9ms
JDK7 w/ 1GB heap:  avg minor GC pause = 5ms
JDK7 w/ 10GB heap: avg minor GC pause = 13.3ms

GC logs attached.  Platform info below.

Any help understanding this behavior would be appreciated.

java version "1.6.0_27"
Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
Java HotSpot(TM) 64-Bit Server VM (build 20.2-b06, mixed mode)

java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

-d64 -Xms1000m -Xmx1000m -XX:MaxNewSize=168M -XX:PermSize=48m -XX:MaxPermSize=96m -Xnoclassgc -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -verbose:gc -Xloggc:./logs/gc-output.log

HP ProLiant DL360 G7
Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
Red Hat Enterprise Linux Server release 6.4 (Santiago)
Linux ll-lt-fxmr-03 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Feb 20 12:17:37 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

________________________________
This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140108/08531ea5/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcLogs.tar.gz
Type: application/x-gzip
Size: 69885 bytes
Desc: gcLogs.tar.gz
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140108/08531ea5/gcLogs.tar-0001.gz 

From wolfgang.pedot at finkzeit.at  Wed Jan  8 05:26:41 2014
From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot)
Date: Wed, 08 Jan 2014 14:26:41 +0100
Subject: G1: higher perm gen footprint or a possible perm gen leak?
In-Reply-To: <52CB0BA5.2080202@oracle.com>
References: <CABzyjy=h=kqxWjCGiob9juX9GB21+e+jJjer+jBMx+TS+OdRYQ@mail.gmail.com>	<52C6D83A.8070309@finkzeit.at>	<CAPRKMfE2J4VrKi1kFQEnDOf2VmZDB02dezs9B9rzER70FQt7jQ@mail.gmail.com>	<52C6FBE6.6040904@oracle.com>
	<52C721B3.9010909@finkzeit.at> <52CB0BA5.2080202@oracle.com>
Message-ID: <52CD5211.4000802@finkzeit.at>

Hi,

 >> I dont really think G1 causes this 3-1 rythm specifically but whats
>> interesting is that CMS with ClassUnloading never got significantly
>> below that 0.8GB if I remember correctly.
> Try
>
> -XX:MarkSweepAlwaysCompactCount=1
>
> which should make every full GC compact out all
> the dead space.
>
> Alternatively try
>
> -XX:MarkSweepAlwaysCompactCount=8
>
> and see if that changes the pattern.
>

thats it, with a value of 1 all PermGen collects reach the same usage.
Since the compacting collects are not visibly slower than "normal" full 
GCs I guess I?ll lower that value on the live system to increase time 
between full GCs.

thanks for the tip
Wolfgang


From Andreas.Mueller at mgm-tp.com  Wed Jan  8 05:53:16 2014
From: Andreas.Mueller at mgm-tp.com (=?iso-8859-1?Q?Andreas_M=FCller?=)
Date: Wed, 8 Jan 2014 13:53:16 +0000
Subject: ParNew pauses longer in JDK7
Message-ID: <46FF8393B58AD84D95E444264805D98FBDE13A2A@edata01.mgm-edv.de>

Hi Shane,


>While benchmarking my application on JDK7, I noticed that minor GC pauses are longer compared to JDK6.  One clue may relate to heap size.  I noticed that heap size (Xmx) has much more impact on minor GC in JDK7.  This is what I have observed:


>JDK6 w/ 1GB heap:  avg minor GC pause = 3.9ms

>JDK6 w/ 10GB heap:  avg minor GC pause = 3.9ms

>JDK7 w/ 1GB heap:  avg minor GC pause = 5ms

>JDK7 w/ 10GB heap: avg minor GC pause = 13.3ms

Very interesting: Only your comparison with Java 6 made clear to me that this is a bug in Java7!
You can probably work around that problem and make Java 7 perform better by explicitly setting the NewSize to a much higher value than the default of around 160 MB which I see in all the GC logs.

I have observed before that the CMS collector does not perform well for small (and default) NewSize. Find the details here:
http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/
in the section about the GarbageOnly benchmark and, in particular, figures 2+3.
I have further observed that the increase in accumulated pause time (shown in figure 3) for smaller values of NewSize comes about because ParNew pauses get LONGER when NewSize gets SMALLER (which is odd enough).
I have a figure showing that but removed it from my blog post because it is (too) long already. I will send it to you in an extra mail, though.

It did, however, not occur to me to check that rather odd behavior with older Java versions (as you did with Java 6).
Thanks for asking your question!
You helped me to understand that there is another bug which I unknowingly documented on my blog post.
The first one is already highlighted in figure 11 of the same article and was also discussed on this mailing list some weeks ago.

Mit freundlichen Gr??en/Best regards

Andreas M?ller

mgm technology partners GmbH
Frankfurter Ring 105a
80807 M?nchen
Tel. +49 (89) 35 86 80-633
Fax +49 (89) 35 86 80-288
E-Mail Andreas.Mueller at mgm-tp.com<mailto:Andreas.Mueller at mgm-tp.com>
Innovation Implemented.
Sitz der Gesellschaft: M?nchen
Gesch?ftsf?hrer: Hamarz Mehmanesh
Handelsregister: AG M?nchen HRB 105068

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140108/928708b7/attachment.html 

From Andreas.Mueller at mgm-tp.com  Wed Jan  8 06:05:50 2014
From: Andreas.Mueller at mgm-tp.com (=?iso-8859-1?Q?Andreas_M=FCller?=)
Date: Wed, 8 Jan 2014 14:05:50 +0000
Subject: AW: Re: ParNew pauses longer in JDK7
Message-ID: <46FF8393B58AD84D95E444264805D98FBDE13A41@edata01.mgm-edv.de>

Hi Shane,

your problem is documented in the purple lines of the attached plot:
New gen pauses get LONGER when new gen size gets smaller for the CMS collector only.
As you can see from figure 3 of my blog post this translates into much more accumulated pause time because new gen pauses also get MORE FREQUENT when new gen size gets smaller.
The fast growth of accumulated pause time directly translates into a sharp decrease of GC throughput (figure 2) as is explained (even with a formula) in the text.

Remeasuring those purple lines with Java 6 would probably show that this is a Java 7 problem.
So far I have only measured one point (at NewSize=160 MB) in comparison to the solid purple line which confirmed what you observed with your benchmark.
As there is a link to the source code of my benchmark in the article anybody can use it to reproduce the difference in Java 6 and 7.

Best regards
Andreas

Von: Andreas M?ller
Gesendet: Mittwoch, 8. Januar 2014 14:53
An: 'Shane.Cox at theice.com'
Cc: hotspot-gc-use at openjdk.java.net
Betreff: Re: ParNew pauses longer in JDK7

Hi Shane,


>While benchmarking my application on JDK7, I noticed that minor GC pauses are longer compared to JDK6.  One clue may relate to heap size.  I noticed that heap size (Xmx) has much more impact on minor GC in JDK7.  This is what I have observed:


>JDK6 w/ 1GB heap:  avg minor GC pause = 3.9ms

>JDK6 w/ 10GB heap:  avg minor GC pause = 3.9ms

>JDK7 w/ 1GB heap:  avg minor GC pause = 5ms

>JDK7 w/ 10GB heap: avg minor GC pause = 13.3ms

Very interesting: Only your comparison with Java 6 made clear to me that this is a bug in Java7!
You can probably work around that problem and make Java 7 perform better by explicitly setting the NewSize to a much higher value than the default of around 160 MB which I see in all the GC logs.

I have observed before that the CMS collector does not perform well for small (and default) NewSize. Find the details here:
http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/
in the section about the GarbageOnly benchmark and, in particular, figures 2+3.
I have further observed that the increase in accumulated pause time (shown in figure 3) for smaller values of NewSize comes about because ParNew pauses get LONGER when NewSize gets SMALLER (which is odd enough).
I have a figure showing that but removed it from my blog post because it is (too) long already. I will send it to you in an extra mail, though.

It did, however, not occur to me to check that rather odd behavior with older Java versions (as you did with Java 6).
Thanks for asking your question!
You helped me to understand that there is another bug which I unknowingly documented on my blog post.
The first one is already highlighted in figure 11 of the same article and was also discussed on this mailing list some weeks ago.

Mit freundlichen Gr??en/Best regards

Andreas M?ller

mgm technology partners GmbH
Frankfurter Ring 105a
80807 M?nchen
Tel. +49 (89) 35 86 80-633
Fax +49 (89) 35 86 80-288
E-Mail Andreas.Mueller at mgm-tp.com<mailto:Andreas.Mueller at mgm-tp.com>
Innovation Implemented.
Sitz der Gesellschaft: M?nchen
Gesch?ftsf?hrer: Hamarz Mehmanesh
Handelsregister: AG M?nchen HRB 105068

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140108/a40d3fd5/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NewGenPauseDuration.png
Type: image/png
Size: 61103 bytes
Desc: NewGenPauseDuration.png
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140108/a40d3fd5/NewGenPauseDuration-0001.png 

From gustav.r.akesson at gmail.com  Mon Jan 13 02:50:00 2014
From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=)
Date: Mon, 13 Jan 2014 11:50:00 +0100
Subject: Long remark due to young generation occupancy
Message-ID: <CAKEw5+7tKvZOEJO_VH=5AZcjGSMyN1jVTu3WxBPaZO6cekKx4w@mail.gmail.com>

Hi,

This is a topic which has been discussed before, but I think I have some
new findings. We're experiencing problems with CMS pauses.

Settings we are using.

-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=68
-XX:MaxTenuringThreshold=0
-XX:+UseParNewGC
-XX:+ScavengeBeforeFullGC
-XX:CMSWaitDuration=30000
-Xmx2048M
-Xms2048M
-Xmn1024M

Note that MaxTenuringThreshold is 0. This is only done during test to
provoke the CMS to run more frequently (otherwise it runs once every
day...). Due to this, promotion to old generation is around 400K to 1M per
second.

We have an allocation rate of roughly 1G per second, meaning that YGC runs
once every second.

We're running JDK7u17.


This is a log entry when running with above settings. This entry is the
typical example to all of the CMS collections in this test.


*2014-01-13T09:31:52.504+0100: 661.675: [GC [1 CMS-initial-mark:
524986K(1048576K)] 526507K(2096192K), 0.0023550 secs] [Times: user=0.00
sys=0.00, real=0.01 secs]*2014-01-13T09:31:52.506+0100: 661.677:
[CMS-concurrent-mark-start]
2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-mark: 0.138/0.138
secs] [Times: user=1.96 sys=0.11, real=0.13 secs]
2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-preclean-start]
2014-01-13T09:31:52.655+0100: 661.826: [CMS-concurrent-preclean:
0.010/0.011 secs] [Times: user=0.14 sys=0.02, real=0.02 secs]
2014-01-13T09:31:52.655+0100: 661.826:
[CMS-concurrent-abortable-preclean-start]
2014-01-13T09:31:53.584+0100: 662.755: [GC 662.755: [ParNew
Desired survivor size 491520 bytes, new threshold 0 (max 0)
: 1046656K->0K(1047616K), 0.0039870 secs] 1571642K->525579K(2096192K),
0.0043310 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]
2014-01-13T09:31:54.146+0100: 663.317: [CMS-concurrent-abortable-preclean:
0.831/1.491 secs] [Times: user=16.76 sys=1.54, real=1.49 secs]

*2014-01-13T09:31:54.148+0100: 663.319: [GC[YG occupancy: 552670 K (1047616
K)]663.319: [Rescan (parallel) , 0.2000060 secs]663.519: [weak refs
processing, 0.0008740 secs]663.520: [scrub string table, 0.0006940 secs] [1
CMS-remark: 525579K(1048576K)] 1078249K(2096192K), 0.2017690 secs] [Times:
user=3.53 sys=0.01, real=0.20 secs]*2014-01-13T09:31:54.350+0100: 663.521:
[CMS-concurrent-sweep-start]
2014-01-13T09:31:54.846+0100: 664.017: [GC 664.017: [ParNew
Desired survivor size 491520 bytes, new threshold 0 (max 0)
: 1046656K->0K(1047616K), 0.0033500 secs] 1330075K->284041K(2096192K),
0.0034660 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]
2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-sweep: 0.665/0.670
secs] [Times: user=7.77 sys=0.71, real=0.67 secs]
2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-reset-start]
2014-01-13T09:31:55.023+0100: 664.194: [CMS-concurrent-reset: 0.003/0.003
secs] [Times: user=0.03 sys=0.00, real=0.00 secs]

The initial pause is fine. Then I investigated how to reduce the remark
phase, and activated -XX:+CMSScavengeBeforeRemark. That flag partly solves
this issue (not active in the log above), but I've seen cases when it does
not scavenge (I suspect JNI critical section), which is bad and generates
yet again long remark pause. And yet again the pause is correlated towards
the occupancy in young.

So instead, I tried setting...

-XX:CMSScheduleRemarkEdenPenetration=0
-XX:CMSScheduleRemarkEdenSizeThreshold=0

This is a log entry with the settings at the top plus the two above...


*2014-01-13T10:18:25.757+0100: 590.198: [GC [1 CMS-initial-mark:
524654K(1048576K)] 526646K(2096192K), 0.0029130 secs] [Times: user=0.00
sys=0.00, real=0.01 secs]*2014-01-13T10:18:25.760+0100: 590.201:
[CMS-concurrent-mark-start]
2014-01-13T10:18:25.904+0100: 590.345: [CMS-concurrent-mark: 0.144/0.144
secs] [Times: user=1.98 sys=0.15, real=0.14 secs]
2014-01-13T10:18:25.904+0100: 590.346: [CMS-concurrent-preclean-start]
2014-01-13T10:18:25.912+0100: 590.354: [CMS-concurrent-preclean:
0.008/0.008 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]
2014-01-13T10:18:25.912+0100: 590.354:
[CMS-concurrent-abortable-preclean-start]
2014-01-13T10:18:26.836+0100: 591.278: [GC 591.278: [ParNew
Desired survivor size 491520 bytes, new threshold 0 (max 0)
: 1046656K->0K(1047616K), 0.0048160 secs] 1571310K->525477K(2096192K),
0.0049240 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
2014-01-13T10:18:26.842+0100: 591.283: [CMS-concurrent-abortable-preclean:
0.608/0.929 secs] [Times: user=10.77 sys=0.97, real=0.93 secs]

*2014-01-13T10:18:26.843+0100: 591.285: [GC[YG occupancy: 20938 K (1047616
K)]591.285: [Rescan (parallel) , 0.0024770 secs]591.287: [weak refs
processing, 0.0007760 secs]591.288: [scrub string table, 0.0006440 secs] [1
CMS-remark: 525477K(1048576K)] 546415K(2096192K), 0.0040480 secs] [Times:
user=0.03 sys=0.00, real=0.00 secs]*2014-01-13T10:18:26.848+0100: 591.289:
[CMS-concurrent-sweep-start]
2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-sweep: 0.726/0.726
secs] [Times: user=8.50 sys=0.76, real=0.73 secs]
2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-reset-start]
2014-01-13T10:18:27.576+0100: 592.017: [CMS-concurrent-reset: 0.003/0.003
secs] [Times: user=0.03 sys=0.01, real=0.00 secs]

This means that when I set these two, CMS STWs go from ~200ms to below 10ms.

I'm leaning towards activating...

-XX:CMSScheduleRemarkEdenPenetration=0
-XX:CMSScheduleRemarkEdenSizeThreshold=0
-XX:CMSMaxAbortablePrecleanTime=30000


What I have seen with these flags is that as soon as a young is completely
collected during abortable preclean, the remark is scheduled and since it
can start when eden is nearly empty, it is ridicously fast. In case it
takes a long time for preclean to catch a young collection, it is also fine
because no promotion is being made. We can live with the pause of young
plus a consecutive remark (for us, young is ~10ms).

So, to the question - is there any obvious drawbacks with the three
settings above? Why does eden have to be 50% (default) in order for a
remark to be scheduled (besides spreading the pause)? It does only seem to
do harm. Any reason?

-XX:+CMSScavengeBeforeRemark I'm thinking to avoid since it can't be
completely trusted. Usually it helps, but that is not good enough since the
pauses get irregular in case it fails. And with these settings above, it
will only add to the CMS pause.


Best Regards,

Gustav ?kesson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140113/18b50992/attachment.html 

From stoth at miami-holdings.com  Tue Jan 14 12:10:40 2014
From: stoth at miami-holdings.com (Steven Toth)
Date: Tue, 14 Jan 2014 20:10:40 +0000
Subject: Seeking assistance with long garbage collection pauses with G1GC
In-Reply-To: <B01F3AEF-47E7-4B40-B2DF-C47D00A553D4@gmail.com>
References: <0FF288A8F3C0E44A83B55C4E260BF0032AFC0B4F@DNY2I1EXC03.miamiholdings.corp>
	<0FF288A8F3C0E44A83B55C4E260BF0032AFD04E3@DNY2I1EXC03.miamiholdings.corp>
	<B01F3AEF-47E7-4B40-B2DF-C47D00A553D4@gmail.com>
Message-ID: <0FF288A8F3C0E44A83B55C4E260BF0032AFFEEFD@DNY2I1EXC03.miamiholdings.corp>

Charlie, thank you very much.  After disabling THP on our servers and running for the past few weeks in production we've received no long pauses for GC's.  

That was a lifesaver.

-Steve


-----Original Message-----
From: charlie hunt [mailto:charlesjhunt at gmail.com] 
Sent: Thursday, December 12, 2013 11:51 AM
To: Steven Toth
Cc: hotspot-gc-use at openjdk.java.net; Randy Foster
Subject: Re: Seeking assistance with long garbage collection pauses with G1GC

Fyi, G1 was not officially supported on until JDK 1.7.0_04, aka 7u4.

Not only are there many improvements in 7u4 vs 7u3, but many improvements since 7u4.  I'd recommend you work with 7u40 or 7u45.

All the above said, copy times look incredibly high for a 3 gb Java heap.  Depending on your version of RHEL, if transparent huge pages are an available feature on your version RHEL, disable it.  You might be seeing huge page coalescing which is contributing to your high sys time. Alternatively you may be paging / swapping, or possibly having high thread context switching.

You might also need to throttle back the number GC threads.

hths,

charlie ...

On Dec 10, 2013, at 6:16 PM, Steven Toth <stoth at miami-holdings.com> wrote:

> Hello,
> 
> We've been struggling with long pauses with the G1GC garbage collector for several weeks now and was hoping to get some assistance.  
> 
> We have a Java app running in a standalone JVM on RHEL.  The app listens for data on one or more sockets, queues the data, and has scheduled threads pulling the data off the queue and persisting it. The data is wide, over 700 data elements per record, though all of the data elements are small Strings, Integers, or Longs.
> 
> The app runs smoothly for periods of time, sometimes 30 minutes to an hour, but then we experience one or more long garbage collection pauses. The logs indicate the majority of the pause time is spent in the Object Copy time. The long pauses also have high sys time relative to the other shorter collections.
> 
> Here are the JVM details:
> 
> java version "1.7.0_03" 
> Java(TM) SE Runtime Environment (build 1.7.0_03-b04) Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode)
> 
> Here are the JVM options:
> 
> -XX:MaxPermSize=256m -XX:PermSize=256m -Xms3G -Xmx3G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:-UseGCOverheadLimit \ -Xloggc:logs/gc-STAT5-collector.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps \ -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=8 -XX:GCLogFileSize=10M -XX:+PrintGCApplicationStoppedTime \ -XX:MaxNewSize=1G -XX:NewSize=1G \ -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -XX:+PrintAdaptiveSizePolicy
> 
> After several iterations of experimenting with an assortment of options (including no options other than -Xms and -Xmx) the aforementioned options have given us the best performance with the fewest amount of long pauses.  However we're still experiencing several dozen garbage collections a day that range from 1-5 seconds.  
> 
> The process is taskset to 4 cores (all on the same socket), but is barely using 2 of them. All of the processes on this box are pinned to their own cores (with 0 and 1 unused). The machine has plenty of free memory (20+G) and top shows the process using 2.5G of RES memory.
> 
> A day's worth of garbage collection logs are attached, but here is an example of the GC log output with high Object Copy and sys time.  There are numerous GC events comparable to the example below with near identical Eden/Survivors/Heap sizes that take well under 100 millis whereas this example took over 2 seconds. 
> 
> [Object Copy (ms): 2090.4 2224.0 2484.0 2160.1 1603.9 2071.2 887.8 1608.1 1992.0 2030.5 1692.5 1583.9 2140.3 1703.0 2174.0 1949.5 1941.1 2190.1 2153.3 1604.1 1930.8 1892.6 1651.9
> 
> [Eden: 1017M(1017M)->0B(1016M) Survivors: 7168K->8192K Heap: 1062M(3072M)->47M(3072M)] 
> 
> [Times: user=2.24 sys=7.22, real=2.49 secs]
> 
> Any help would be greatly appreciated. 
> 
> Thanks. 
> 
> -Steve
> 
> 
> ****Confidentiality Note**** This e-mail may contain confidential and or privileged information and is solely for the use of the sender's intended recipient(s). Any review, dissemination, copying, printing or other use of this e-mail by any other persons or entities is prohibited. If you have received this e-mail in error, please contact the sender immediately by reply email and delete the material from any computer.
> <gc-STAT5-collector.log.zip>_______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

***Confidentiality Note*** This e-mail may contain confidential and or privileged information and is solely for the use of the sender's intended recipient(s). Any review, dissemination, copying, printing or other use of this e-mail by any other persons or entities is prohibited. If you have received this e-mail in error, please contact the sender immediately by reply email and delete the material from any computer.


From bernd-2014 at eckenfels.net  Tue Jan 14 13:34:01 2014
From: bernd-2014 at eckenfels.net (Bernd Eckenfels)
Date: Tue, 14 Jan 2014 22:34:01 +0100
Subject: Seeking assistance with long garbage collection pauses with G1GC
In-Reply-To: <0FF288A8F3C0E44A83B55C4E260BF0032AFFEEFD@DNY2I1EXC03.miamiholdings.corp>
References: <0FF288A8F3C0E44A83B55C4E260BF0032AFC0B4F@DNY2I1EXC03.miamiholdings.corp>
	<0FF288A8F3C0E44A83B55C4E260BF0032AFD04E3@DNY2I1EXC03.miamiholdings.corp>
	<B01F3AEF-47E7-4B40-B2DF-C47D00A553D4@gmail.com>
	<0FF288A8F3C0E44A83B55C4E260BF0032AFFEEFD@DNY2I1EXC03.miamiholdings.corp>
Message-ID: <op.w9o4uzg90xp9um@eckenfels02.seeburger.de>

Hello,

I wonder if there is anything the VM can do to avoid this? Maybe with some  
memadvice, changed allocation pattern or similiar? Is this only a problem  
when the VM does not use LargePages itself, or is it affected in that case  
as well?

Gruss
Bernd

Am 14.01.2014, 21:10 Uhr, schrieb Steven Toth <stoth at miami-holdings.com>:

> Charlie, thank you very much.  After disabling THP on our servers and  
> running for the past few weeks in production we've received no long  
> pauses for GC's.
>
> That was a lifesaver.
>
> -Steve
>
>
> -----Original Message-----
> From: charlie hunt [mailto:charlesjhunt at gmail.com]
> Sent: Thursday, December 12, 2013 11:51 AM
> To: Steven Toth
> Cc: hotspot-gc-use at openjdk.java.net; Randy Foster
> Subject: Re: Seeking assistance with long garbage collection pauses with  
> G1GC
>
> Fyi, G1 was not officially supported on until JDK 1.7.0_04, aka 7u4.
>
> Not only are there many improvements in 7u4 vs 7u3, but many  
> improvements since 7u4.  I'd recommend you work with 7u40 or 7u45.
>
> All the above said, copy times look incredibly high for a 3 gb Java  
> heap.  Depending on your version of RHEL, if transparent huge pages are  
> an available feature on your version RHEL, disable it.  You might be  
> seeing huge page coalescing which is contributing to your high sys time.  
> Alternatively you may be paging / swapping, or possibly having high  
> thread context switching.
>
> You might also need to throttle back the number GC threads.
>
> hths,
>
> charlie ...
>
> On Dec 10, 2013, at 6:16 PM, Steven Toth <stoth at miami-holdings.com>  
> wrote:
>
>> Hello,
>>
>> We've been struggling with long pauses with the G1GC garbage collector  
>> for several weeks now and was hoping to get some assistance.
>>
>> We have a Java app running in a standalone JVM on RHEL.  The app  
>> listens for data on one or more sockets, queues the data, and has  
>> scheduled threads pulling the data off the queue and persisting it. The  
>> data is wide, over 700 data elements per record, though all of the data  
>> elements are small Strings, Integers, or Longs.
>>
>> The app runs smoothly for periods of time, sometimes 30 minutes to an  
>> hour, but then we experience one or more long garbage collection  
>> pauses. The logs indicate the majority of the pause time is spent in  
>> the Object Copy time. The long pauses also have high sys time relative  
>> to the other shorter collections.
>>
>> Here are the JVM details:
>>
>> java version "1.7.0_03"
>> Java(TM) SE Runtime Environment (build 1.7.0_03-b04) Java HotSpot(TM)  
>> 64-Bit Server VM (build 22.1-b02, mixed mode)
>>
>> Here are the JVM options:
>>
>> -XX:MaxPermSize=256m -XX:PermSize=256m -Xms3G -Xmx3G -XX:+UseG1GC  
>> -XX:G1HeapRegionSize=32M -XX:-UseGCOverheadLimit \  
>> -Xloggc:logs/gc-STAT5-collector.log -XX:+PrintGCDetails  
>> -XX:+PrintGCDateStamps \ -XX:+PrintGCTimeStamps  
>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=8  
>> -XX:GCLogFileSize=10M -XX:+PrintGCApplicationStoppedTime \  
>> -XX:MaxNewSize=1G -XX:NewSize=1G \ -XX:+PrintGCApplicationStoppedTime  
>> -XX:+PrintTenuringDistribution -XX:+PrintAdaptiveSizePolicy
>>
>> After several iterations of experimenting with an assortment of options  
>> (including no options other than -Xms and -Xmx) the aforementioned  
>> options have given us the best performance with the fewest amount of  
>> long pauses.  However we're still experiencing several dozen garbage  
>> collections a day that range from 1-5 seconds.
>>
>> The process is taskset to 4 cores (all on the same socket), but is  
>> barely using 2 of them. All of the processes on this box are pinned to  
>> their own cores (with 0 and 1 unused). The machine has plenty of free  
>> memory (20+G) and top shows the process using 2.5G of RES memory.
>>
>> A day's worth of garbage collection logs are attached, but here is an  
>> example of the GC log output with high Object Copy and sys time.  There  
>> are numerous GC events comparable to the example below with near  
>> identical Eden/Survivors/Heap sizes that take well under 100 millis  
>> whereas this example took over 2 seconds.
>>
>> [Object Copy (ms): 2090.4 2224.0 2484.0 2160.1 1603.9 2071.2 887.8  
>> 1608.1 1992.0 2030.5 1692.5 1583.9 2140.3 1703.0 2174.0 1949.5 1941.1  
>> 2190.1 2153.3 1604.1 1930.8 1892.6 1651.9
>>
>> [Eden: 1017M(1017M)->0B(1016M) Survivors: 7168K->8192K Heap:  
>> 1062M(3072M)->47M(3072M)]
>>
>> [Times: user=2.24 sys=7.22, real=2.49 secs]
>>
>> Any help would be greatly appreciated.
>>
>> Thanks.
>>
>> -Steve
>>
>>
>> ****Confidentiality Note**** This e-mail may contain confidential and  
>> or privileged information and is solely for the use of the sender's  
>> intended recipient(s). Any review, dissemination, copying, printing or  
>> other use of this e-mail by any other persons or entities is  
>> prohibited. If you have received this e-mail in error, please contact  
>> the sender immediately by reply email and delete the material from any  
>> computer.
>> <gc-STAT5-collector.log.zip>_______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
> ***Confidentiality Note*** This e-mail may contain confidential and or  
> privileged information and is solely for the use of the sender's  
> intended recipient(s). Any review, dissemination, copying, printing or  
> other use of this e-mail by any other persons or entities is prohibited.  
> If you have received this e-mail in error, please contact the sender  
> immediately by reply email and delete the material from any computer.
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


-- 
http://bernd.eckenfels.net

From carlo.fernando at baml.com  Tue Jan 14 13:47:33 2014
From: carlo.fernando at baml.com (Fernando, Carlo)
Date: Tue, 14 Jan 2014 21:47:33 +0000
Subject: Trying to understand what happens during GenCollectForALlocation
Message-ID: <204609DC9565564AA71E9B4312EA323240A38D8F@smtp_mail.bankofamerica.com>

Hello.

I'm trying to reduce our server latency and I'm trying to understand the meaning of this SafePointStatistic output.

A snippet of the GC log:
2014-01-10T08:54:12.767+0000: 110949.481: [GC 110949.481: [ParNew: 65673K->83K(76480K), 0.0013940 secs] 88634K->23044K(251264K), 0.0014490 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
Total time for which application threads were stopped: 0.0048290 seconds

A snippet of the Safepoint log which what I think correlates to the GC log:
110949.484: GenCollectForAllocation          [      55          0              0    ]      [     0     0     0     0     4    ]  0

Is it correct to say that the whole duration of the GC is 1ms but because of the safepoint, total STW was 4ms?
Also, what could the possible cause be of the 4ms pause?

In addition, I also noticed 1 output where real time was larger than user+sys. Would that indicate some type of cpu starvation?

2014-01-10T13:11:04.727+0000: 126361.441: [GC 126361.473: [ParNew: 65676K->72K(76480K), 0.0014070 secs] 89630K->24025K(251264K), 0.0014720 secs] [Times: user=0.01 sys=0.00, real=0.03 secs]
Total time for which application threads were stopped: 0.0335640 seconds

126361.438: GenCollectForAllocation          [      55          0              0    ]      [     0     0     0     0    33    ]  0

Please let me know if you need any other info.

Thanks

-carlo

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended recipient, please delete this message.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140114/4ad90b4e/attachment.html 

From jon.masamitsu at oracle.com  Wed Jan 15 10:24:34 2014
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Wed, 15 Jan 2014 10:24:34 -0800
Subject: Long remark due to young generation occupancy
In-Reply-To: <CAKEw5+7tKvZOEJO_VH=5AZcjGSMyN1jVTu3WxBPaZO6cekKx4w@mail.gmail.com>
References: <CAKEw5+7tKvZOEJO_VH=5AZcjGSMyN1jVTu3WxBPaZO6cekKx4w@mail.gmail.com>
Message-ID: <52D6D262.6010106@oracle.com>

> -XX:CMSScheduleRemarkEdenPenetration=0
Schedule the remark pause immediately after the
next young collection.
> -XX:CMSScheduleRemarkEdenSizeThreshold=0
Any sized eden should allow scheduling of the remark
pause.  That is, no eden is too small to schedule.
> -XX:CMSMaxAbortablePrecleanTime=30000
Wait up to 30 seconds for the remark to be scheduled
after a young collection.  Otherwise, wait only up to
the default of 5 seconds.

> So, to the question - is there any obvious drawbacks with the three 
> settings above? Why does eden have to be 50% (default) in order for a 
> remark to be scheduled (besides spreading the pause)? It does only 
> seem to do harm. Any reason?

The default is 50% to try and place the remark pause between two young 
pauses
(spread it out as you say).   I don't believe it is always the case that 
the remark
pause is very small if it is scheduled immediately after a young 
collection.  In
such cases we still want to spread out the pauses.

If the remark is delayed to wait for the next young collection,
the sweeping is also delayed.  You're not using up space in the
CMS (tenured) generation but you're also not collecting garbage
and not making additional space available for reuse (which the
concurrent sweep does).

Jon

On 01/13/2014 02:50 AM, Gustav ?kesson wrote:
> Hi,
> This is a topic which has been discussed before, but I think I have 
> some new findings. We're experiencing problems with CMS pauses.
> Settings we are using.
> -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=68
> -XX:MaxTenuringThreshold=0
> -XX:+UseParNewGC
> -XX:+ScavengeBeforeFullGC
> -XX:CMSWaitDuration=30000
> -Xmx2048M
> -Xms2048M
> -Xmn1024M
> Note that MaxTenuringThreshold is 0. This is only done during test to 
> provoke the CMS to run more frequently (otherwise it runs once every 
> day...). Due to this, promotion to old generation is around 400K to 1M 
> per second.
> We have an allocation rate of roughly 1G per second, meaning that YGC 
> runs once every second.
> We're running JDK7u17.
> This is a log entry when running with above settings. This entry is 
> the typical example to all of the CMS collections in this test.
> *2014-01-13T09:31:52.504+0100: 661.675: [GC [1 CMS-initial-mark: 
> 524986K(1048576K)] 526507K(2096192K), 0.0023550 secs] [Times: 
> user=0.00 sys=0.00, real=0.01 secs]
> *2014-01-13T09:31:52.506+0100: 661.677: [CMS-concurrent-mark-start]
> 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-mark: 
> 0.138/0.138 secs] [Times: user=1.96 sys=0.11, real=0.13 secs]
> 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-preclean-start]
> 2014-01-13T09:31:52.655+0100: 661.826: [CMS-concurrent-preclean: 
> 0.010/0.011 secs] [Times: user=0.14 sys=0.02, real=0.02 secs]
> 2014-01-13T09:31:52.655+0100: 661.826: 
> [CMS-concurrent-abortable-preclean-start]
> 2014-01-13T09:31:53.584+0100: 662.755: [GC 662.755: [ParNew
> Desired survivor size 491520 bytes, new threshold 0 (max 0)
> : 1046656K->0K(1047616K), 0.0039870 secs] 1571642K->525579K(2096192K), 
> 0.0043310 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]
> 2014-01-13T09:31:54.146+0100: 663.317: 
> [CMS-concurrent-abortable-preclean: 0.831/1.491 secs] [Times: 
> user=16.76 sys=1.54, real=1.49 secs]
> *2014-01-13T09:31:54.148+0100: 663.319: [GC[YG occupancy: 552670 K 
> (1047616 K)]663.319: [Rescan (parallel) , 0.2000060 secs]663.519: 
> [weak refs processing, 0.0008740 secs]663.520: [scrub string table, 
> 0.0006940 secs] [1 CMS-remark: 525579K(1048576K)] 1078249K(2096192K), 
> 0.2017690 secs] [Times: user=3.53 sys=0.01, real=0.20 secs]
> *2014-01-13T09:31:54.350+0100: 663.521: [CMS-concurrent-sweep-start]
> 2014-01-13T09:31:54.846+0100: 664.017: [GC 664.017: [ParNew
> Desired survivor size 491520 bytes, new threshold 0 (max 0)
> : 1046656K->0K(1047616K), 0.0033500 secs] 1330075K->284041K(2096192K), 
> 0.0034660 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]
> 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-sweep: 
> 0.665/0.670 secs] [Times: user=7.77 sys=0.71, real=0.67 secs]
> 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-reset-start]
> 2014-01-13T09:31:55.023+0100: 664.194: [CMS-concurrent-reset: 
> 0.003/0.003 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]
> The initial pause is fine. Then I investigated how to reduce the 
> remark phase, and activated -XX:+CMSScavengeBeforeRemark. That flag 
> partly solves this issue (not active in the log above), but I've seen 
> cases when it does not scavenge (I suspect JNI critical section), 
> which is bad and generates yet again long remark pause. And yet again 
> the pause is correlated towards the occupancy in young.
> So instead, I tried setting...
> -XX:CMSScheduleRemarkEdenPenetration=0
> -XX:CMSScheduleRemarkEdenSizeThreshold=0
> This is a log entry with the settings at the top plus the two above...
> *2014-01-13T10:18:25.757+0100: 590.198: [GC [1 CMS-initial-mark: 
> 524654K(1048576K)] 526646K(2096192K), 0.0029130 secs] [Times: 
> user=0.00 sys=0.00, real=0.01 secs]
> *2014-01-13T10:18:25.760+0100: 590.201: [CMS-concurrent-mark-start]
> 2014-01-13T10:18:25.904+0100: 590.345: [CMS-concurrent-mark: 
> 0.144/0.144 secs] [Times: user=1.98 sys=0.15, real=0.14 secs]
> 2014-01-13T10:18:25.904+0100: 590.346: [CMS-concurrent-preclean-start]
> 2014-01-13T10:18:25.912+0100: 590.354: [CMS-concurrent-preclean: 
> 0.008/0.008 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]
> 2014-01-13T10:18:25.912+0100: 590.354: 
> [CMS-concurrent-abortable-preclean-start]
> 2014-01-13T10:18:26.836+0100: 591.278: [GC 591.278: [ParNew
> Desired survivor size 491520 bytes, new threshold 0 (max 0)
> : 1046656K->0K(1047616K), 0.0048160 secs] 1571310K->525477K(2096192K), 
> 0.0049240 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
> 2014-01-13T10:18:26.842+0100: 591.283: 
> [CMS-concurrent-abortable-preclean: 0.608/0.929 secs] [Times: 
> user=10.77 sys=0.97, real=0.93 secs]
> *2014-01-13T10:18:26.843+0100: 591.285: [GC[YG occupancy: 20938 K 
> (1047616 K)]591.285: [Rescan (parallel) , 0.0024770 secs]591.287: 
> [weak refs processing, 0.0007760 secs]591.288: [scrub string table, 
> 0.0006440 secs] [1 CMS-remark: 525477K(1048576K)] 546415K(2096192K), 
> 0.0040480 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]
> *2014-01-13T10:18:26.848+0100: 591.289: [CMS-concurrent-sweep-start]
> 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-sweep: 
> 0.726/0.726 secs] [Times: user=8.50 sys=0.76, real=0.73 secs]
> 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-reset-start]
> 2014-01-13T10:18:27.576+0100: 592.017: [CMS-concurrent-reset: 
> 0.003/0.003 secs] [Times: user=0.03 sys=0.01, real=0.00 secs]
> This means that when I set these two, CMS STWs go from ~200ms to below 
> 10ms.
> I'm leaning towards activating...
> -XX:CMSScheduleRemarkEdenPenetration=0
> -XX:CMSScheduleRemarkEdenSizeThreshold=0
> -XX:CMSMaxAbortablePrecleanTime=30000
> What I have seen with these flags is that as soon as a young is 
> completely collected during abortable preclean, the remark is 
> scheduled and since it can start when eden is nearly empty, it is 
> ridicously fast. In case it takes a long time for preclean to catch a 
> young collection, it is also fine because no promotion is being made. 
> We can live with the pause of young plus a consecutive remark (for us, 
> young is ~10ms).
> So, to the question - is there any obvious drawbacks with the three 
> settings above? Why does eden have to be 50% (default) in order for a 
> remark to be scheduled (besides spreading the pause)? It does only 
> seem to do harm. Any reason?
> -XX:+CMSScavengeBeforeRemark I'm thinking to avoid since it can't be 
> completely trusted. Usually it helps, but that is not good enough 
> since the pauses get irregular in case it fails. And with these 
> settings above, it will only add to the CMS pause.
> Best Regards,
> Gustav ?kesson
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140115/59f03f13/attachment.html 

From gustav.r.akesson at gmail.com  Wed Jan 15 11:41:14 2014
From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=)
Date: Wed, 15 Jan 2014 20:41:14 +0100
Subject: Long remark due to young generation occupancy
In-Reply-To: <52D6D262.6010106@oracle.com>
References: <CAKEw5+7tKvZOEJO_VH=5AZcjGSMyN1jVTu3WxBPaZO6cekKx4w@mail.gmail.com>
	<52D6D262.6010106@oracle.com>
Message-ID: <CAKEw5+5KY4FDxKRXzPwGi79wWsU3iLOejW-Pw3xDA1iWftaOJQ@mail.gmail.com>

Hi Jon,

Thanks for looking into this.

A clarification for "Wait up to 30 seconds for the remark to be scheduled after
a young collection":
Is this really the case? Is this timeout used after a young collection? I
was under the impression that CMSMaxAbortablePrecleanTime precleans and
waits for a young collection, and if one occurs (or timeout) the remark is
scheduled. Here we wait for 30 seconds to a young to happen. Right..?

The work for remark is to revisit updated objects  and trace from roots
again (missing something? ah, and reference processing, but that is
practically no overhead for us). What is usually the biggest cost of the
remark? To scan the dirty cards or to trace from roots? Perhaps this
depends on the application - you're talking about "not always the case".
What do you refer to? If we have en empty young generation, what could
bring the remark phase to e.g. 200ms on a high-end server like ours?

For my application, it seems that it tracing from roots that is the most
expensive. In such scenario, spreading the pause seems as beneficial as not
running a young collection prior to initial mark (which is highly dependent
on occupancy in young). Especially since young collection is so fast, at
least for us.

Regarding the last section that we wait a long time for sweeping - does
this really matter? Yes, we have a lot of floating garbage in case young
collections are infrequent and we keep on precleaning, but that also means
no promotions. The garbage is just sitting there on the heap taking space,
but no one is claiming that space until a young collection. And by then the
sweeping proceeds. Or am I missing something?


Best Regards,

Gustav ?kesson


On Wed, Jan 15, 2014 at 7:24 PM, Jon Masamitsu <jon.masamitsu at oracle.com>wrote:

>
>  -XX:CMSScheduleRemarkEdenPenetration=0
>
> Schedule the remark pause immediately after the
> next young collection.
>
>  -XX:CMSScheduleRemarkEdenSizeThreshold=0
>
> Any sized eden should allow scheduling of the remark
> pause.  That is, no eden is too small to schedule.
>
> -XX:CMSMaxAbortablePrecleanTime=30000
>
>
> Wait up to 30 seconds for the remark to be scheduled
> after a young collection.  Otherwise, wait only up to
> the default of 5 seconds.
>
>
>  So, to the question - is there any obvious drawbacks with the three
> settings above? Why does eden have to be 50% (default) in order for a
> remark to be scheduled (besides spreading the pause)? It does only seem to
> do harm. Any reason?
>
>
> The default is 50% to try and place the remark pause between two young
> pauses
> (spread it out as you say).   I don't believe it is always the case that
> the remark
> pause is very small if it is scheduled immediately after a young
> collection.  In
> such cases we still want to spread out the pauses.
>
> If the remark is delayed to wait for the next young collection,
> the sweeping is also delayed.  You're not using up space in the
> CMS (tenured) generation but you're also not collecting garbage
> and not making additional space available for reuse (which the
> concurrent sweep does).
>
> Jon
>
>
> On 01/13/2014 02:50 AM, Gustav ?kesson wrote:
>
>  Hi,
>
> This is a topic which has been discussed before, but I think I have some
> new findings. We're experiencing problems with CMS pauses.
>
> Settings we are using.
>
> -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=68
> -XX:MaxTenuringThreshold=0
> -XX:+UseParNewGC
> -XX:+ScavengeBeforeFullGC
> -XX:CMSWaitDuration=30000
> -Xmx2048M
> -Xms2048M
> -Xmn1024M
>
> Note that MaxTenuringThreshold is 0. This is only done during test to
> provoke the CMS to run more frequently (otherwise it runs once every
> day...). Due to this, promotion to old generation is around 400K to 1M per
> second.
>
> We have an allocation rate of roughly 1G per second, meaning that YGC runs
> once every second.
>
>  We're running JDK7u17.
>
>
> This is a log entry when running with above settings. This entry is the
> typical example to all of the CMS collections in this test.
>
>
> *2014-01-13T09:31:52.504+0100: 661.675: [GC [1 CMS-initial-mark:
> 524986K(1048576K)] 526507K(2096192K), 0.0023550 secs] [Times: user=0.00
> sys=0.00, real=0.01 secs] *2014-01-13T09:31:52.506+0100: 661.677:
> [CMS-concurrent-mark-start]
> 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-mark: 0.138/0.138
> secs] [Times: user=1.96 sys=0.11, real=0.13 secs]
> 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-preclean-start]
> 2014-01-13T09:31:52.655+0100: 661.826: [CMS-concurrent-preclean:
> 0.010/0.011 secs] [Times: user=0.14 sys=0.02, real=0.02 secs]
> 2014-01-13T09:31:52.655+0100: 661.826:
> [CMS-concurrent-abortable-preclean-start]
> 2014-01-13T09:31:53.584+0100: 662.755: [GC 662.755: [ParNew
> Desired survivor size 491520 bytes, new threshold 0 (max 0)
> : 1046656K->0K(1047616K), 0.0039870 secs] 1571642K->525579K(2096192K),
> 0.0043310 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]
> 2014-01-13T09:31:54.146+0100: 663.317: [CMS-concurrent-abortable-preclean:
> 0.831/1.491 secs] [Times: user=16.76 sys=1.54, real=1.49 secs]
>
> *2014-01-13T09:31:54.148+0100: 663.319: [GC[YG occupancy: 552670 K
> (1047616 K)]663.319: [Rescan (parallel) , 0.2000060 secs]663.519: [weak
> refs processing, 0.0008740 secs]663.520: [scrub string table, 0.0006940
> secs] [1 CMS-remark: 525579K(1048576K)] 1078249K(2096192K), 0.2017690 secs]
> [Times: user=3.53 sys=0.01, real=0.20 secs] *2014-01-13T09:31:54.350+0100:
> 663.521: [CMS-concurrent-sweep-start]
> 2014-01-13T09:31:54.846+0100: 664.017: [GC 664.017: [ParNew
> Desired survivor size 491520 bytes, new threshold 0 (max 0)
> : 1046656K->0K(1047616K), 0.0033500 secs] 1330075K->284041K(2096192K),
> 0.0034660 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]
> 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-sweep: 0.665/0.670
> secs] [Times: user=7.77 sys=0.71, real=0.67 secs]
> 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-reset-start]
> 2014-01-13T09:31:55.023+0100: 664.194: [CMS-concurrent-reset: 0.003/0.003
> secs] [Times: user=0.03 sys=0.00, real=0.00 secs]
>
> The initial pause is fine. Then I investigated how to reduce the remark
> phase, and activated -XX:+CMSScavengeBeforeRemark. That flag partly solves
> this issue (not active in the log above), but I've seen cases when it does
> not scavenge (I suspect JNI critical section), which is bad and generates
> yet again long remark pause. And yet again the pause is correlated towards
> the occupancy in young.
>
> So instead, I tried setting...
>
> -XX:CMSScheduleRemarkEdenPenetration=0
> -XX:CMSScheduleRemarkEdenSizeThreshold=0
>
> This is a log entry with the settings at the top plus the two above...
>
>
> *2014-01-13T10:18:25.757+0100: 590.198: [GC [1 CMS-initial-mark:
> 524654K(1048576K)] 526646K(2096192K), 0.0029130 secs] [Times: user=0.00
> sys=0.00, real=0.01 secs] *2014-01-13T10:18:25.760+0100: 590.201:
> [CMS-concurrent-mark-start]
> 2014-01-13T10:18:25.904+0100: 590.345: [CMS-concurrent-mark: 0.144/0.144
> secs] [Times: user=1.98 sys=0.15, real=0.14 secs]
> 2014-01-13T10:18:25.904+0100: 590.346: [CMS-concurrent-preclean-start]
> 2014-01-13T10:18:25.912+0100: 590.354: [CMS-concurrent-preclean:
> 0.008/0.008 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]
> 2014-01-13T10:18:25.912+0100: 590.354:
> [CMS-concurrent-abortable-preclean-start]
> 2014-01-13T10:18:26.836+0100: 591.278: [GC 591.278: [ParNew
> Desired survivor size 491520 bytes, new threshold 0 (max 0)
> : 1046656K->0K(1047616K), 0.0048160 secs] 1571310K->525477K(2096192K),
> 0.0049240 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
> 2014-01-13T10:18:26.842+0100: 591.283: [CMS-concurrent-abortable-preclean:
> 0.608/0.929 secs] [Times: user=10.77 sys=0.97, real=0.93 secs]
>
> *2014-01-13T10:18:26.843+0100: 591.285: [GC[YG occupancy: 20938 K (1047616
> K)]591.285: [Rescan (parallel) , 0.0024770 secs]591.287: [weak refs
> processing, 0.0007760 secs]591.288: [scrub string table, 0.0006440 secs] [1
> CMS-remark: 525477K(1048576K)] 546415K(2096192K), 0.0040480 secs] [Times:
> user=0.03 sys=0.00, real=0.00 secs] *2014-01-13T10:18:26.848+0100:
> 591.289: [CMS-concurrent-sweep-start]
> 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-sweep: 0.726/0.726
> secs] [Times: user=8.50 sys=0.76, real=0.73 secs]
> 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-reset-start]
> 2014-01-13T10:18:27.576+0100: 592.017: [CMS-concurrent-reset: 0.003/0.003
> secs] [Times: user=0.03 sys=0.01, real=0.00 secs]
>
> This means that when I set these two, CMS STWs go from ~200ms to below
> 10ms.
>
> I'm leaning towards activating...
>
>  -XX:CMSScheduleRemarkEdenPenetration=0
> -XX:CMSScheduleRemarkEdenSizeThreshold=0
>  -XX:CMSMaxAbortablePrecleanTime=30000
>
>
> What I have seen with these flags is that as soon as a young is completely
> collected during abortable preclean, the remark is scheduled and since it
> can start when eden is nearly empty, it is ridicously fast. In case it
> takes a long time for preclean to catch a young collection, it is also fine
> because no promotion is being made. We can live with the pause of young
> plus a consecutive remark (for us, young is ~10ms).
>
> So, to the question - is there any obvious drawbacks with the three
> settings above? Why does eden have to be 50% (default) in order for a
> remark to be scheduled (besides spreading the pause)? It does only seem to
> do harm. Any reason?
>
> -XX:+CMSScavengeBeforeRemark I'm thinking to avoid since it can't be
> completely trusted. Usually it helps, but that is not good enough since the
> pauses get irregular in case it fails. And with these settings above, it
> will only add to the CMS pause.
>
>
> Best Regards,
>
> Gustav ?kesson
>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140115/64e00278/attachment-0001.html 

From jon.masamitsu at oracle.com  Wed Jan 15 17:22:00 2014
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Wed, 15 Jan 2014 17:22:00 -0800
Subject: Long remark due to young generation occupancy
In-Reply-To: <CAKEw5+5KY4FDxKRXzPwGi79wWsU3iLOejW-Pw3xDA1iWftaOJQ@mail.gmail.com>
References: <CAKEw5+7tKvZOEJO_VH=5AZcjGSMyN1jVTu3WxBPaZO6cekKx4w@mail.gmail.com>
	<52D6D262.6010106@oracle.com>
	<CAKEw5+5KY4FDxKRXzPwGi79wWsU3iLOejW-Pw3xDA1iWftaOJQ@mail.gmail.com>
Message-ID: <52D73438.2070407@oracle.com>


On 1/15/2014 11:41 AM, Gustav ?kesson wrote:
> Hi Jon,
>
> Thanks for looking into this.
>
> A clarification for "Wait up to 30 seconds for the remark to be 
> scheduled after a young collection":
> Is this really the case? Is this timeout used after a young 
> collection? I was under the impression that 
> CMSMaxAbortablePrecleanTime precleans and waits for a young 
> collection, and if one occurs (or timeout) the remark is scheduled. 
> Here we wait for 30 seconds to a young to happen. Right..?

I think it is

1) concurrent marking runs
2) concurrent precleaning runs
3) concurrent abortable precleaning runs until the remark can
be scheduled or the timeout is reached  With CMSMaxAbortablePrecleanTime 
set to 30000, the
abortable precleaning runs for 30 seconds unless aborted. Scheduling the 
remark means
waiting for the next young collection to empty eden and then
waiting for allocations to fill up eden to the 
CMSScheduleRemarkEdenPenetration
percentage.   When CMSScheduleRemarkEdenPenetration is
reached abortable precleaning is aborted.
4) remark runs

>
> The work for remark is to revisit updated objects  and trace from 
> roots again (missing something? ah, and reference processing, but that 
> is practically no overhead for us). What is usually the biggest cost 
> of the remark? To scan the dirty cards or to trace from roots? Perhaps 
> this depends on the application - you're talking about "not always the 
> case". What do you refer to? If we have en empty young generation, 
> what could bring the remark phase to e.g. 200ms on a high-end server 
> like ours?

There could be thousands of thread stacks to scan.   Some applications 
make heavy
use of soft References.   Class unloading happens at remark.  The young 
gen is not
necessarily empty after a young collection.  Some applications make good 
use of the
survivor spaces.

>
> For my application, it seems that it tracing from roots that is the 
> most expensive. In such scenario, spreading the pause seems as 
> beneficial as not running a young collection prior to initial mark 
> (which is highly dependent on occupancy in young). Especially since 
> young collection is so fast, at least for us.
>
> Regarding the last section that we wait a long time for sweeping - 
> does this really matter? Yes, we have a lot of floating garbage in 
> case young collections are infrequent and we keep on precleaning, but 
> that also means no promotions. The garbage is just sitting there on 
> the heap taking space, but no one is claiming that space until a young 
> collection. And by then the sweeping proceeds. Or am I missing something?

Some applications can be easily scaled up to the point where the
allocation rate (and promotion rate because of their object lifetimes)
exceeds the rate at which CMS can collect.   Such applications sometimes
are run with very little excess space in the heap and any delay in
any part of the CMS collection can mean CMS loses the race and
falls back to a full collection.  That's all I'm saying.  If you're not
in that situation, don't worry about it.

Jon


>
>
> Best Regards,
>
> Gustav ?kesson
>
>
>
>
>
>
> On Wed, Jan 15, 2014 at 7:24 PM, Jon Masamitsu 
> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>
>>     -XX:CMSScheduleRemarkEdenPenetration=0
>     Schedule the remark pause immediately after the
>     next young collection.
>>     -XX:CMSScheduleRemarkEdenSizeThreshold=0
>     Any sized eden should allow scheduling of the remark
>     pause.  That is, no eden is too small to schedule.
>>     -XX:CMSMaxAbortablePrecleanTime=30000
>     Wait up to 30 seconds for the remark to be scheduled
>     after a young collection.  Otherwise, wait only up to
>     the default of 5 seconds.
>
>
>>     So, to the question - is there any obvious drawbacks with the
>>     three settings above? Why does eden have to be 50% (default) in
>>     order for a remark to be scheduled (besides spreading the pause)?
>>     It does only seem to do harm. Any reason?
>
>     The default is 50% to try and place the remark pause between two
>     young pauses
>     (spread it out as you say).   I don't believe it is always the
>     case that the remark
>     pause is very small if it is scheduled immediately after a young
>     collection.  In
>     such cases we still want to spread out the pauses.
>
>     If the remark is delayed to wait for the next young collection,
>     the sweeping is also delayed.  You're not using up space in the
>     CMS (tenured) generation but you're also not collecting garbage
>     and not making additional space available for reuse (which the
>     concurrent sweep does).
>
>     Jon
>
>
>     On 01/13/2014 02:50 AM, Gustav ?kesson wrote:
>>     Hi,
>>     This is a topic which has been discussed before, but I think I
>>     have some new findings. We're experiencing problems with CMS pauses.
>>     Settings we are using.
>>     -XX:+UseConcMarkSweepGC
>>     -XX:CMSInitiatingOccupancyFraction=68
>>     -XX:MaxTenuringThreshold=0
>>     -XX:+UseParNewGC
>>     -XX:+ScavengeBeforeFullGC
>>     -XX:CMSWaitDuration=30000
>>     -Xmx2048M
>>     -Xms2048M
>>     -Xmn1024M
>>     Note that MaxTenuringThreshold is 0. This is only done during
>>     test to provoke the CMS to run more frequently (otherwise it runs
>>     once every day...). Due to this, promotion to old generation is
>>     around 400K to 1M per second.
>>     We have an allocation rate of roughly 1G per second, meaning that
>>     YGC runs once every second.
>>     We're running JDK7u17.
>>     This is a log entry when running with above settings. This entry
>>     is the typical example to all of the CMS collections in this test.
>>     *2014-01-13T09:31:52.504+0100: 661.675: [GC [1 CMS-initial-mark:
>>     524986K(1048576K)] 526507K(2096192K), 0.0023550 secs] [Times:
>>     user=0.00 sys=0.00, real=0.01 secs]
>>     *2014-01-13T09:31:52.506+0100: 661.677: [CMS-concurrent-mark-start]
>>     2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-mark:
>>     0.138/0.138 secs] [Times: user=1.96 sys=0.11, real=0.13 secs]
>>     2014-01-13T09:31:52.644+0100: 661.815:
>>     [CMS-concurrent-preclean-start]
>>     2014-01-13T09:31:52.655+0100: 661.826: [CMS-concurrent-preclean:
>>     0.010/0.011 secs] [Times: user=0.14 sys=0.02, real=0.02 secs]
>>     2014-01-13T09:31:52.655+0100: 661.826:
>>     [CMS-concurrent-abortable-preclean-start]
>>     2014-01-13T09:31:53.584+0100: 662.755: [GC 662.755: [ParNew
>>     Desired survivor size 491520 bytes, new threshold 0 (max 0)
>>     : 1046656K->0K(1047616K), 0.0039870 secs]
>>     1571642K->525579K(2096192K), 0.0043310 secs] [Times: user=0.04
>>     sys=0.00, real=0.01 secs]
>>     2014-01-13T09:31:54.146+0100: 663.317:
>>     [CMS-concurrent-abortable-preclean: 0.831/1.491 secs] [Times:
>>     user=16.76 sys=1.54, real=1.49 secs]
>>     *2014-01-13T09:31:54.148+0100: 663.319: [GC[YG occupancy: 552670
>>     K (1047616 K)]663.319: [Rescan (parallel) , 0.2000060
>>     secs]663.519: [weak refs processing, 0.0008740 secs]663.520:
>>     [scrub string table, 0.0006940 secs] [1 CMS-remark:
>>     525579K(1048576K)] 1078249K(2096192K), 0.2017690 secs] [Times:
>>     user=3.53 sys=0.01, real=0.20 secs]
>>     *2014-01-13T09:31:54.350+0100: 663.521: [CMS-concurrent-sweep-start]
>>     2014-01-13T09:31:54.846+0100: 664.017: [GC 664.017: [ParNew
>>     Desired survivor size 491520 bytes, new threshold 0 (max 0)
>>     : 1046656K->0K(1047616K), 0.0033500 secs]
>>     1330075K->284041K(2096192K), 0.0034660 secs] [Times: user=0.04
>>     sys=0.00, real=0.00 secs]
>>     2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-sweep:
>>     0.665/0.670 secs] [Times: user=7.77 sys=0.71, real=0.67 secs]
>>     2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-reset-start]
>>     2014-01-13T09:31:55.023+0100: 664.194: [CMS-concurrent-reset:
>>     0.003/0.003 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]
>>     The initial pause is fine. Then I investigated how to reduce the
>>     remark phase, and activated -XX:+CMSScavengeBeforeRemark. That
>>     flag partly solves this issue (not active in the log above), but
>>     I've seen cases when it does not scavenge (I suspect JNI critical
>>     section), which is bad and generates yet again long remark pause.
>>     And yet again the pause is correlated towards the occupancy in young.
>>     So instead, I tried setting...
>>     -XX:CMSScheduleRemarkEdenPenetration=0
>>     -XX:CMSScheduleRemarkEdenSizeThreshold=0
>>     This is a log entry with the settings at the top plus the two
>>     above...
>>     *2014-01-13T10:18:25.757+0100: 590.198: [GC [1 CMS-initial-mark:
>>     524654K(1048576K)] 526646K(2096192K), 0.0029130 secs] [Times:
>>     user=0.00 sys=0.00, real=0.01 secs]
>>     *2014-01-13T10:18:25.760+0100: 590.201: [CMS-concurrent-mark-start]
>>     2014-01-13T10:18:25.904+0100: 590.345: [CMS-concurrent-mark:
>>     0.144/0.144 secs] [Times: user=1.98 sys=0.15, real=0.14 secs]
>>     2014-01-13T10:18:25.904+0100: 590.346:
>>     [CMS-concurrent-preclean-start]
>>     2014-01-13T10:18:25.912+0100: 590.354: [CMS-concurrent-preclean:
>>     0.008/0.008 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]
>>     2014-01-13T10:18:25.912+0100: 590.354:
>>     [CMS-concurrent-abortable-preclean-start]
>>     2014-01-13T10:18:26.836+0100: 591.278: [GC 591.278: [ParNew
>>     Desired survivor size 491520 bytes, new threshold 0 (max 0)
>>     : 1046656K->0K(1047616K), 0.0048160 secs]
>>     1571310K->525477K(2096192K), 0.0049240 secs] [Times: user=0.05
>>     sys=0.00, real=0.01 secs]
>>     2014-01-13T10:18:26.842+0100: 591.283:
>>     [CMS-concurrent-abortable-preclean: 0.608/0.929 secs] [Times:
>>     user=10.77 sys=0.97, real=0.93 secs]
>>     *2014-01-13T10:18:26.843+0100: 591.285: [GC[YG occupancy: 20938 K
>>     (1047616 K)]591.285: [Rescan (parallel) , 0.0024770 secs]591.287:
>>     [weak refs processing, 0.0007760 secs]591.288: [scrub string
>>     table, 0.0006440 secs] [1 CMS-remark: 525477K(1048576K)]
>>     546415K(2096192K), 0.0040480 secs] [Times: user=0.03 sys=0.00,
>>     real=0.00 secs]
>>     *2014-01-13T10:18:26.848+0100: 591.289: [CMS-concurrent-sweep-start]
>>     2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-sweep:
>>     0.726/0.726 secs] [Times: user=8.50 sys=0.76, real=0.73 secs]
>>     2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-reset-start]
>>     2014-01-13T10:18:27.576+0100: 592.017: [CMS-concurrent-reset:
>>     0.003/0.003 secs] [Times: user=0.03 sys=0.01, real=0.00 secs]
>>     This means that when I set these two, CMS STWs go from ~200ms to
>>     below 10ms.
>>     I'm leaning towards activating...
>>     -XX:CMSScheduleRemarkEdenPenetration=0
>>     -XX:CMSScheduleRemarkEdenSizeThreshold=0
>>     -XX:CMSMaxAbortablePrecleanTime=30000
>>     What I have seen with these flags is that as soon as a young is
>>     completely collected during abortable preclean, the remark is
>>     scheduled and since it can start when eden is nearly empty, it is
>>     ridicously fast. In case it takes a long time for preclean to
>>     catch a young collection, it is also fine because no promotion is
>>     being made. We can live with the pause of young plus a
>>     consecutive remark (for us, young is ~10ms).
>>     So, to the question - is there any obvious drawbacks with the
>>     three settings above? Why does eden have to be 50% (default) in
>>     order for a remark to be scheduled (besides spreading the pause)?
>>     It does only seem to do harm. Any reason?
>>     -XX:+CMSScavengeBeforeRemark I'm thinking to avoid since it can't
>>     be completely trusted. Usually it helps, but that is not good
>>     enough since the pauses get irregular in case it fails. And with
>>     these settings above, it will only add to the CMS pause.
>>     Best Regards,
>>     Gustav ?kesson
>>
>>
>>     _______________________________________________
>>     hotspot-gc-use mailing list
>>     hotspot-gc-use at openjdk.java.net  <mailto:hotspot-gc-use at openjdk.java.net>
>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140115/be0a04ce/attachment.html 

From gustav.r.akesson at gmail.com  Thu Jan 16 22:50:51 2014
From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=)
Date: Fri, 17 Jan 2014 07:50:51 +0100
Subject: Long remark due to young generation occupancy
In-Reply-To: <CAKEw5+7cRVL_wGBx-JPS7+hm0zWHb1C1tvvKnU93=u3=-5pHbw@mail.gmail.com>
References: <CAKEw5+7tKvZOEJO_VH=5AZcjGSMyN1jVTu3WxBPaZO6cekKx4w@mail.gmail.com>
	<52D6D262.6010106@oracle.com>
	<CAKEw5+5KY4FDxKRXzPwGi79wWsU3iLOejW-Pw3xDA1iWftaOJQ@mail.gmail.com>
	<52D73438.2070407@oracle.com>
	<CAKEw5+7cRVL_wGBx-JPS7+hm0zWHb1C1tvvKnU93=u3=-5pHbw@mail.gmail.com>
Message-ID: <CAKEw5+7KP4_1YcnT-e35duK+zXiFF2HkxHVsQ0=aXsUG3czmyg@mail.gmail.com>

Hi,

(Sorry for spam, Jon - didn't reply below to all in gc-use)

There could be thousands of thread stacks to scan. Some applications make
heavy
use of soft References. Class unloading happens at remark. The young gen is
not
necessarily empty after a young collection. Some applications make good use
of the
survivor spaces.

Perhaps our application is less dependant on this, since we're only having
a couple of hundred threads, no use of soft references and hardly every
unload any classes. Also our aim is to have every request's allocation die
in eden (or first collection from survivor). Likely my ideas presented here
is well-suited for our application due to these reasons - our biggest
remark bottleneck is the size of young generation.

Some applications can be easily scaled up to the point where the
allocation rate (and promotion rate because of their object lifetimes)
exceeds the rate at which CMS can collect. Such applications sometimes
are run with very little excess space in the heap and any delay in
any part of the CMS collection can mean CMS loses the race and
falls back to a full collection. That's all I'm saying. If you're not
in that situation, don't worry about it.

 But if the application is scaled up to an extreme allocation rate (and
promotion rate) then we will also hit YGCs more often, which means that the
abortable preclean will exit and schedule remark and then sweep. Then it
doesn't matter if abortable preclean if 5000 or 30000 in case YGC hits e.g.
every 3s - right? On the other hand, in case the allocation rate (and thus,
also promotion rate) is low then the abortable preclean runs and the
garbage is not bothering anyone sitting on the heap waiting for a YGC.
 An update for this experiment - during the night I ran 57 CMS collections
and 52 of them were below 10ms. The other 5 were pretty long - 100ms to
200ms and yet again the pauses can be correlated towards the occupancy of
young. In the long pauses, after exiting the abortable preclean 100ms
lapsed before starting the remarking, making eden have roughly 120mb of
occupancy.
 Folks, I'd very much appreciate if we could keep this discussion alive and
please give any input possible regarding these flags. Any input or
experience is appreciated
Thanks for your insights, Jon.
 Best Regards,
Gustav ?kesson


On Thu, Jan 16, 2014 at 9:26 AM, Gustav ?kesson
<gustav.r.akesson at gmail.com>wrote:

> Hi,
>
>
> There could be thousands of thread stacks to scan. Some applications make
> heavy
> use of soft References. Class unloading happens at remark. The young gen
> is not
> necessarily empty after a young collection. Some applications make good
> use of the
> survivor spaces.
>
>
> Perhaps our application is less dependant on this, since we're only having
> a couple of hundred threads, no use of soft references and hardly every
> unload any classes. Also our aim is to have every request's allocation die
> in eden (or first collection from survivor). Likely my ideas presented here
> is well-suited for our application due to these reasons - our biggest
> remark bottleneck is the size of young generation.
>
>
>
> Some applications can be easily scaled up to the point where the
> allocation rate (and promotion rate because of their object lifetimes)
>  exceeds the rate at which CMS can collect. Such applications sometimes
> are run with very little excess space in the heap and any delay in
> any part of the CMS collection can mean CMS loses the race and
>  falls back to a full collection. That's all I'm saying. If you're not
> in that situation, don't worry about it.
>
>
>
> But if the application is scaled up to an extreme allocation rate (and
> promotion rate) then we will also hit YGCs more often, which means that the
> abortable preclean will exit and schedule remark and then sweep. Then it
> doesn't matter if abortable preclean if 5000 or 30000 in case YGC hits e.g.
> every 3s - right? On the other hand, in case the allocation rate (and thus,
> also promotion rate) is low then the abortable preclean runs and
> the garbage is not bothering anyone sitting on the heap waiting for a YGC.
>
>
> An update for this experiment - during the night I ran 57 CMS collections
> and 52 of them were below 10ms. The other 5 were pretty long - 100ms to
> 200ms and yet again the pauses can be correlated towards the occupancy of
> young. In the long pauses, after exiting the abortable preclean 100ms
> lapsed before starting the remarking, making eden have roughly 120mb of
> occupancy.
>
>
> Folks, I'd very much appreciate if we could keep this discussion alive and
> please give any input possible regarding these flags. Any input or
> experience is appreciated
>
> Thanks for your insights, Jon.
>
>
> Best Regards,
>
> Gustav ?kesson
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140117/bf18a946/attachment.html 

From gustav.r.akesson at gmail.com  Thu Jan 16 22:59:56 2014
From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=)
Date: Fri, 17 Jan 2014 07:59:56 +0100
Subject: Long remark due to young generation occupancy
In-Reply-To: <CAKEw5+7KP4_1YcnT-e35duK+zXiFF2HkxHVsQ0=aXsUG3czmyg@mail.gmail.com>
References: <CAKEw5+7tKvZOEJO_VH=5AZcjGSMyN1jVTu3WxBPaZO6cekKx4w@mail.gmail.com>
	<52D6D262.6010106@oracle.com>
	<CAKEw5+5KY4FDxKRXzPwGi79wWsU3iLOejW-Pw3xDA1iWftaOJQ@mail.gmail.com>
	<52D73438.2070407@oracle.com>
	<CAKEw5+7cRVL_wGBx-JPS7+hm0zWHb1C1tvvKnU93=u3=-5pHbw@mail.gmail.com>
	<CAKEw5+7KP4_1YcnT-e35duK+zXiFF2HkxHVsQ0=aXsUG3czmyg@mail.gmail.com>
Message-ID: <CAKEw5+4B92EY+Ha1Xe24EfbSG21fh=_tgiTctLOBORMcYj604w@mail.gmail.com>

Hi,

An update on this experiment.

I managed to track down the issue with 5 of the 57 collections (which still
had 100-200ms pauses). It was due to the abortable preclean sleeping. When
abortable preclean scans dirty cards and less than 100 cards were scanned,
then it sleeps for 100ms. In case that happens and a YG is processed, then
it will take ~100ms (default) to reach the remark, which means that eden
will fill up again. When I lowered the sleep time then the highest GC (out
of 50) was 43ms and 33ms. Rest was <11ms. Most of them were 7-8ms.


Best Regards,

Gustav ?kesson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140117/a4a60327/attachment.html 

From gustav.r.akesson at gmail.com  Wed Jan 22 03:12:22 2014
From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=)
Date: Wed, 22 Jan 2014 12:12:22 +0100
Subject: Fragmentation and UseCMSInitiatingOccupancyOnly
Message-ID: <CAKEw5+6RydiX8Fw+AjRC97rC2cLJjtNAW=sOuuW-tOajmhe+yg@mail.gmail.com>

Hi,

In case UseCMSInitiatingOccupancyOnly is enabled we instruct CMS to start
at X% of old gen, and not try to figure out by itself when to start. My
understanding is that when flag is disabled, CMS is aiming for X%, but uses
statistics of previous collections (GC rate, GC time) to determine when to
initiate.

My question is whether enabling UseCMSInitiatingOccupancyOnly increases the
risk of promotion failure (FullGC) due to fragmentation, meaning that it
will always honor X% rule and rather generate promotion failure event than
run CMS prematurely?

Or, in case flag is disabled, is CMS smart enough to start prior to X% when
heap is fragmented instead of generating promotion failure?


Best Regards,

Gustav ?kesson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140122/92e95a10/attachment.html 

From bernd-2014 at eckenfels.net  Wed Jan 22 03:37:27 2014
From: bernd-2014 at eckenfels.net (Bernd Eckenfels)
Date: Wed, 22 Jan 2014 12:37:27 +0100
Subject: Fragmentation and UseCMSInitiatingOccupancyOnly
In-Reply-To: <CAKEw5+6RydiX8Fw+AjRC97rC2cLJjtNAW=sOuuW-tOajmhe+yg@mail.gmail.com>
References: <CAKEw5+6RydiX8Fw+AjRC97rC2cLJjtNAW=sOuuW-tOajmhe+yg@mail.gmail.com>
Message-ID: <95E1766F-DADA-411D-A48E-8CC603169926@eckenfels.net>

If You set the percentage low, the Risk is, that oldgen will be permanently over the threshold (this might be wanted?). If you set it high, then AF might happen due to fragmentation or background collection beeing too slow.

 I think fragmentation is not honored, therefore your desired oldheap size should account for that an have plenty of (untouched) headroom.

Bernd

> Am 22.01.2014 um 12:12 schrieb Gustav ?kesson <gustav.r.akesson at gmail.com>:
> 
> Hi,
>  
> In case UseCMSInitiatingOccupancyOnly is enabled we instruct CMS to start at X% of old gen, and not try to figure out by itself when to start. My understanding is that when flag is disabled, CMS is aiming for X%, but uses statistics of previous collections (GC rate, GC time) to determine when to initiate.
>  
> My question is whether enabling UseCMSInitiatingOccupancyOnly increases the risk of promotion failure (FullGC) due to fragmentation, meaning that it will always honor X% rule and rather generate promotion failure event than run CMS prematurely?
>  
> Or, in case flag is disabled, is CMS smart enough to start prior to X% when heap is fragmented instead of generating promotion failure?
>  
>  
> Best Regards,
>  
> Gustav ?kesson
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From gustav.r.akesson at gmail.com  Wed Jan 22 11:37:29 2014
From: gustav.r.akesson at gmail.com (=?iso-8859-1?Q?Gustav_=C5kesson?=)
Date: Wed, 22 Jan 2014 20:37:29 +0100
Subject: Fragmentation and UseCMSInitiatingOccupancyOnly
In-Reply-To: <95E1766F-DADA-411D-A48E-8CC603169926@eckenfels.net>
References: <CAKEw5+6RydiX8Fw+AjRC97rC2cLJjtNAW=sOuuW-tOajmhe+yg@mail.gmail.com>
	<95E1766F-DADA-411D-A48E-8CC603169926@eckenfels.net>
Message-ID: <B8FE8083-E0A1-4B6B-8EE4-4E9A2A376D57@gmail.com>

Hi,

Not sure I understand the answer - in case the flag is disabled, is little contiguous free space (i.e. fragmentation) in oldgen a variable which could trigger CMS prior to the set occupancy fraction? I.e. is CMS smart enough to take into account the possibility of of clearing out dead objects (thus, reducing fragmentation) before hitting a promotion failure. And if you set UseCMSInitiatingOccupancyOnly then you remove this feature from CMS?

The reason I'm asking is that I got a couple of FullGCs due to promotion failure when setting UseCMSInitiatingOccupancyOnly, which did not occur when the flag was disabled. And  this was on a 512mb heap with occupancy fraction 80% and only 20mb live data.


Best Regards,

Gustav ?kesson


22 jan 2014 kl. 12:37 skrev Bernd Eckenfels <bernd-2014 at eckenfels.net>:

> If You set the percentage low, the Risk is, that oldgen will be permanently over the threshold (this might be wanted?). If you set it high, then AF might happen due to fragmentation or background collection beeing too slow.
> 
> I think fragmentation is not honored, therefore your desired oldheap size should account for that an have plenty of (untouched) headroom.
> 
> Bernd
> 
>> Am 22.01.2014 um 12:12 schrieb Gustav ?kesson <gustav.r.akesson at gmail.com>:
>> 
>> Hi,
>> 
>> In case UseCMSInitiatingOccupancyOnly is enabled we instruct CMS to start at X% of old gen, and not try to figure out by itself when to start. My understanding is that when flag is disabled, CMS is aiming for X%, but uses statistics of previous collections (GC rate, GC time) to determine when to initiate.
>> 
>> My question is whether enabling UseCMSInitiatingOccupancyOnly increases the risk of promotion failure (FullGC) due to fragmentation, meaning that it will always honor X% rule and rather generate promotion failure event than run CMS prematurely?
>> 
>> Or, in case flag is disabled, is CMS smart enough to start prior to X% when heap is fragmented instead of generating promotion failure?
>> 
>> 
>> Best Regards,
>> 
>> Gustav ?kesson
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From bernd-2014 at eckenfels.net  Wed Jan 22 13:04:07 2014
From: bernd-2014 at eckenfels.net (Bernd Eckenfels)
Date: Wed, 22 Jan 2014 22:04:07 +0100
Subject: Fragmentation and UseCMSInitiatingOccupancyOnly
In-Reply-To: <B8FE8083-E0A1-4B6B-8EE4-4E9A2A376D57@gmail.com>
References: <CAKEw5+6RydiX8Fw+AjRC97rC2cLJjtNAW=sOuuW-tOajmhe+yg@mail.gmail.com>
	<95E1766F-DADA-411D-A48E-8CC603169926@eckenfels.net>
	<B8FE8083-E0A1-4B6B-8EE4-4E9A2A376D57@gmail.com>
Message-ID: <op.w93ws5n50xp9um@eckenfels02.seeburger.de>

Hello Gustav,

first of all, I noticed you are talking about a 512mb heap. Can you maybe  
elaborate what hardware that is, and what pause times you see and expect?  
Whats your newsize? Are setting Xmx/Xms to same values? With parold and  
smaller heaps having a smaller initial size will reduce pause times even  
more.

I would expect that in most common scenarios ParallelOld is much more  
reliable (no concurrent mode risk falling back to serial gc;  
defragmenting), easy to tune (consistent behaviour). And the pause times  
should be comparable to your young collections (and certainly much smaller  
than full non-paralle collections).


But, back to your question, I am not so familiar with the AdaptiveSizing  
and FreeList code, but the rest of CMS does not seem to care about  
freechunks vs. fragmentation when considering free memory. Maybe somebody  
else can comment on that.

BTW: I think you can use a larger (more than default 10%) safty margin  
instead of using a lower Occupancy setting if you want to keep the dynamic  
adjustment property but not want to risk concurrent mode failures.


Am 22.01.2014, 20:37 Uhr, schrieb Gustav ?kesson  
<gustav.r.akesson at gmail.com>:

> Not sure I understand the answer - in case the flag is disabled, is  
> little contiguous free space (i.e. fragmentation) in oldgen a variable 

From denny.kettwig at werum.de  Thu Jan 23 00:38:10 2014
From: denny.kettwig at werum.de (Denny Kettwig)
Date: Thu, 23 Jan 2014 08:38:10 +0000
Subject: AW: Unexplanable events in GC logs
Message-ID: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net>

Hey folks,

in one of our recent cluster systems we found 2 unexplainable events within the GC logs. I'd like to address these events to you and ask for your advice. We are running a clustered jBoss System with 4 nodes, every node has the same configuration. We use the following parameters:

-Xms10g
-Xmx10g
-Xmn3g
-Xss2048k
-XX:+ExplicitGCInvokesConcurrent
-XX:+CMSClassUnloadingEnabled
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:ParallelGCThreads=22
-XX:SurvivorRatio=8
-XX:TargetSurvivorRatio=90
-XX:PermSize=512M
-XX:MaxPermSize=512m
-Dsun.rmi.dgc.client.gcInterval=3600000
-Dsun.rmi.dgc.server.gcInterval=3600000

Event 1

The first event is a very frequent CMS collections on the first node for about 11 hours. We are talking here about a peak value of 258 CMS collections per hour. From my knowledge this event starts without any reason and ends without any reason since the old space is below 10% usage in this time frame. I do not know what might have caused this CMS collection. We already experienced similar events on this cluster in the past, but at a much higher heap usage (above 90%) and under high load with most likely a huge heap fragmentation and the frequent CMS collections ended with a single very long Full GC that defragmented the heap. In the current event all this is not the case. I attached the relevant part of the GC log.

Event 2

The second event is just as confusing for me as the first one. At a certain point in time a full GC takes place on all 4 Nodes and I cannot find a reason for it. Here are the relevant parts:


N1

2013-12-20T05:43:28.041+0100: 768231.382: [GC 768231.383: [ParNew: 2618493K->92197K(2831168K), 0.3007160 secs] 3120456K->594160K(10171200K), 0.3017204 secs] [Times: user=0.91 sys=0.00, real=0.30 secs]

2013-12-20T05:45:31.864+0100: 768355.209: [Full GC 768355.210: [CMS: 501963K->496288K(7340032K), 4.2140085 secs] 1397267K->496288K(10171200K), [CMS Perm : 203781K->178528K(524288K)], 4.2148018 secs] [Times: user=4.21 sys=0.00, real=4.21 secs]

2013-12-20T05:52:40.591+0100: 768783.949: [GC 768783.949: [ParNew: 2516608K->48243K(2831168K), 0.2649174 secs] 3012896K->544532K(10171200K), 0.2659039 secs] [Times: user=0.47 sys=0.00, real=0.27 secs]


N2

2013-12-20T04:57:21.524+0100: 765208.310: [GC 765208.311: [ParNew: 924566K->111068K(2831168K), 0.1790573 secs] 1416514K->603015K(10171200K), 0.1797121 secs] [Times: user=1.08 sys=0.00, real=0.19 secs]

2013-12-20T04:57:21.711+0100: 765208.499: [GC [1 CMS-initial-mark: 491947K(7340032K)] 603015K(10171200K), 0.3289639 secs] [Times: user=0.33 sys=0.00, real=0.33 secs]

2013-12-20T04:57:22.039+0100: 765208.828: [CMS-concurrent-mark-start]

2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-mark: 0.577/0.577 secs] [Times: user=3.53 sys=0.05, real=0.58 secs]

2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-preclean-start]

2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-preclean: 0.024/0.024 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]

2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-abortable-preclean-start]

CMS: abort preclean due to time 2013-12-20T04:57:28.060+0100: 765214.848: [CMS-concurrent-abortable-preclean: 4.208/5.418 secs] [Times: user=4.10 sys=0.03, real=5.41 secs]

2013-12-20T04:57:28.076+0100: 765214.857: [GC[YG occupancy: 124872 K (2831168 K)]765214.857: [Rescan (parallel) , 0.0580906 secs]765214.916: [weak refs processing, 0.0000912 secs]765214.916: [class unloading, 0.0769742 secs]765214.993: [scrub symbol & string tables, 0.0612689 secs] [1 CMS-remark: 491947K(7340032K)] 616820K(10171200K), 0.2256506 secs] [Times: user=0.48 sys=0.00, real=0.22 secs]

2013-12-20T04:57:28.294+0100: 765215.083: [CMS-concurrent-sweep-start]

2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-sweep: 0.750/0.750 secs] [Times: user=0.76 sys=0.00, real=0.75 secs]

2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-reset-start]

2013-12-20T04:57:29.074+0100: 765215.856: [CMS-concurrent-reset: 0.022/0.022 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]

2013-12-20T05:47:54.607+0100: 768241.413: [Full GC 768241.414: [CMS: 491947K->464864K(7340032K), 4.5053183 secs] 1910412K->464864K(10171200K), [CMS Perm : 183511K->168997K(524288K)], 4.5059088 secs] [Times: user=4.49 sys=0.02, real=4.51 secs]

2013-12-20T06:47:59.098+0100: 771845.954: [GC 771845.954: [ParNew: 1507774K->58195K(2831168K), 0.2307012 secs] 1972638K->523059K(10171200K), 0.2313931 secs] [Times: user=0.37 sys=0.00, real=0.23 secs]


N3

2013-12-20T05:46:25.441+0100: 767981.526: [GC 767981.526: [ParNew: 2695212K->166641K(2831168K), 0.3278475 secs] 3268057K->739486K(10171200K), 0.3284853 secs] [Times: user=1.62 sys=0.00, real=0.33 secs]

2013-12-20T05:49:55.467+0100: 768191.578: [Full GC 768191.578: [CMS: 572844K->457790K(7340032K), 3.7687762 secs] 1216176K->457790K(10171200K), [CMS Perm : 181711K->169514K(524288K)], 3.7692999 secs] [Times: user=3.76 sys=0.00, real=3.79 secs]

2013-12-20T06:49:59.249+0100: 771795.415: [GC 771795.415: [ParNew: 1585077K->72146K(2831168K), 0.2632945 secs] 2042868K->529936K(10171200K), 0.2639889 secs] [Times: user=0.41 sys=0.00, real=0.27 secs]


N4

2013-12-20T05:48:21.551+0100: 767914.067: [GC 767914.068: [ParNew: 2656327K->119432K(2831168K), 0.2603676 secs] 3222693K->685799K(10171200K), 0.2609581 secs] [Times: user=1.14 sys=0.00, real=0.26 secs]

2013-12-20T05:49:03.939+0100: 767956.457: [Full GC 767956.457: [CMS: 566366K->579681K(7340032K), 6.1324841 secs] 3149011K->579681K(10171200K), [CMS Perm : 190240K->174791K(524288K)], 6.1331389 secs] [Times: user=6.13 sys=0.00, real=6.13 secs]

2013-12-20T05:50:10.262+0100: 768022.762: [GC 768022.763: [ParNew: 2516608K->83922K(2831168K), 0.2157015 secs] 3096289K->663603K(10171200K), 0.2162262 secs] [Times: user=0.41 sys=0.00, real=0.22 secs]


Between these Full GC are only a few minutes or as between N3 and N4 a few seconds. I made some research on possible reasons for a Full GC and this is the list I gathered so far:

1.    Running out of old gen

2.    Running out of perm gen

3.    Calling System.gc() (indicated by System in the ouput)

4.    Not having enough free space in Survivor Space to copy objects from Eden (promotion failed)

5.    Running out of old gen before a concurrent collection can free it (Concurrent Mode Failure)

6.    Having high fragmentation and not enough space for a larger object in old gen


However none of these 6 conditions are fulfilled by any of the above shown full GC. So once again I'm lost and do not have an explanation for this.
If you need the full logs for further analysis please let me know.

Kind Regards
Denny


[cid:image002.png at 01CEF67E.8AEB4630]

Werum Software & Systems AG
Wulf-Werum-Strasse 3 | 21337 Lueneburg | Germany
Tel. +49(0)4131/8900-983 | Fax +49(0)4131/8900-20
mailto:denny.kettwig at werum.de | http://www.werum.de<http://www.werum.de/>

VAT No. DE 116 083 850 | RG Lueneburg HRB 2262
Chairman of Supervisory Board: Johannes Zimmermann
Executive Board: Hartmut Krome, Ruediger Schlierenkaemper, Hans-Peter Subel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/b68b9c03/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1089 bytes
Desc: image001.jpg
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/b68b9c03/image001.jpg 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 6441 bytes
Desc: image002.png
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/b68b9c03/image002.png 

From matthew.miller at forgerock.com  Thu Jan 23 03:58:48 2014
From: matthew.miller at forgerock.com (Matt Miller)
Date: Thu, 23 Jan 2014 06:58:48 -0500
Subject: AW: Unexplanable events in GC logs
In-Reply-To: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net>
References: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net>
Message-ID: <52E103F8.2070009@forgerock.com>

Hi Denny,

Another reason for a Full GC that you did not list is: running jmap 
against the process.

If you run a histo:live or take a live object heap dump, a Full GC will 
happen (and it doesn't show as a System GC).
Is it possible that you are taking a heap histogram or heap dump using 
the live option?

-Matt

On 1/23/14, 3:38 AM, Denny Kettwig wrote:
>
> Hey folks,
>
> in one of our recent cluster systems we found 2 unexplainable events 
> within the GC logs. I'd like to address these events to you and ask 
> for your advice. We are running a clustered jBoss System with 4 nodes, 
> every node has the same configuration. We use the following parameters:
>
> -Xms10g
>
> -Xmx10g
>
> -Xmn3g
>
> -Xss2048k
>
> -XX:+ExplicitGCInvokesConcurrent
>
> -XX:+CMSClassUnloadingEnabled
>
> -XX:+UseParNewGC
>
> -XX:+UseConcMarkSweepGC
>
> -XX:ParallelGCThreads=22
>
> -XX:SurvivorRatio=8
>
> -XX:TargetSurvivorRatio=90
>
> -XX:PermSize=512M
>
> -XX:MaxPermSize=512m
>
> -Dsun.rmi.dgc.client.gcInterval=3600000
>
> -Dsun.rmi.dgc.server.gcInterval=3600000
>
> *Event 1*
>
> The first event is a very frequent CMS collections on the first node 
> for about 11 hours. We are talking here about a peak value of 258 CMS 
> collections per hour. From my knowledge this event starts without any 
> reason and ends without any reason since the old space is below 10% 
> usage in this time frame. I do not know what might have caused this 
> CMS collection. We already experienced similar events on this cluster 
> in the past, but at a much higher heap usage (above 90%) and under 
> high load with most likely a huge heap fragmentation and the frequent 
> CMS collections ended with a single very long Full GC that 
> defragmented the heap. In the current event all this is not the case. 
> *I attached the relevant part of the GC log*.
>
> *Event 2*
>
> The second event is just as confusing for me as the first one. At a 
> certain point in time a full GC takes place on *all 4 Nodes *and I 
> cannot find a reason for it. Here are the relevant parts:
>
> N1
>
> 2013-12-20T05:43:28.041+0100: 768231.382: [GC 768231.383: [ParNew: 
> 2618493K->92197K(2831168K), 0.3007160 secs] 
> 3120456K->594160K(10171200K), 0.3017204 secs] [Times: user=0.91 
> sys=0.00, real=0.30 secs]
>
> 2013-12-20T05:45:31.864+0100: 768355.209: [Full GC 768355.210: [CMS: 
> 501963K->496288K(7340032K), 4.2140085 secs] 
> 1397267K->496288K(10171200K), [CMS Perm : 203781K->178528K(524288K)], 
> 4.2148018 secs] [Times: user=4.21 sys=0.00, real=4.21 secs]
>
> 2013-12-20T05:52:40.591+0100: 768783.949: [GC 768783.949: [ParNew: 
> 2516608K->48243K(2831168K), 0.2649174 secs] 
> 3012896K->544532K(10171200K), 0.2659039 secs] [Times: user=0.47 
> sys=0.00, real=0.27 secs]
>
> N2
>
> 2013-12-20T04:57:21.524+0100: 765208.310: [GC 765208.311: [ParNew: 
> 924566K->111068K(2831168K), 0.1790573 secs] 
> 1416514K->603015K(10171200K), 0.1797121 secs] [Times: user=1.08 
> sys=0.00, real=0.19 secs]
>
> 2013-12-20T04:57:21.711+0100: 765208.499: [GC [1 CMS-initial-mark: 
> 491947K(7340032K)] 603015K(10171200K), 0.3289639 secs] [Times: 
> user=0.33 sys=0.00, real=0.33 secs]
>
> 2013-12-20T04:57:22.039+0100: 765208.828: [CMS-concurrent-mark-start]
>
> 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-mark: 
> 0.577/0.577 secs] [Times: user=3.53 sys=0.05, real=0.58 secs]
>
> 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-preclean-start]
>
> 2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-preclean: 
> 0.024/0.024 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
>
> 2013-12-20T04:57:22.647+0100: 765209.430: 
> [CMS-concurrent-abortable-preclean-start]
>
> CMS: abort preclean due to time 2013-12-20T04:57:28.060+0100: 
> 765214.848: [CMS-concurrent-abortable-preclean: 4.208/5.418 secs] 
> [Times: user=4.10 sys=0.03, real=5.41 secs]
>
> 2013-12-20T04:57:28.076+0100: 765214.857: [GC[YG occupancy: 124872 K 
> (2831168 K)]765214.857: [Rescan (parallel) , 0.0580906 
> secs]765214.916: [weak refs processing, 0.0000912 secs]765214.916: 
> [class unloading, 0.0769742 secs]765214.993: [scrub symbol & string 
> tables, 0.0612689 secs] [1 CMS-remark: 491947K(7340032K)] 
> 616820K(10171200K), 0.2256506 secs] [Times: user=0.48 sys=0.00, 
> real=0.22 secs]
>
> 2013-12-20T04:57:28.294+0100: 765215.083: [CMS-concurrent-sweep-start]
>
> 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-sweep: 
> 0.750/0.750 secs] [Times: user=0.76 sys=0.00, real=0.75 secs]
>
> 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-reset-start]
>
> 2013-12-20T04:57:29.074+0100: 765215.856: [CMS-concurrent-reset: 
> 0.022/0.022 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
>
> 2013-12-20T05:47:54.607+0100: 768241.413: [Full GC 768241.414: [CMS: 
> 491947K->464864K(7340032K), 4.5053183 secs] 
> 1910412K->464864K(10171200K), [CMS Perm : 183511K->168997K(524288K)], 
> 4.5059088 secs] [Times: user=4.49 sys=0.02, real=4.51 secs]
>
> 2013-12-20T06:47:59.098+0100: 771845.954: [GC 771845.954: [ParNew: 
> 1507774K->58195K(2831168K), 0.2307012 secs] 
> 1972638K->523059K(10171200K), 0.2313931 secs] [Times: user=0.37 
> sys=0.00, real=0.23 secs]
>
> N3
>
> 2013-12-20T05:46:25.441+0100: 767981.526: [GC 767981.526: [ParNew: 
> 2695212K->166641K(2831168K), 0.3278475 secs] 
> 3268057K->739486K(10171200K), 0.3284853 secs] [Times: user=1.62 
> sys=0.00, real=0.33 secs]
>
> 2013-12-20T05:49:55.467+0100: 768191.578: [Full GC 768191.578: [CMS: 
> 572844K->457790K(7340032K), 3.7687762 secs] 
> 1216176K->457790K(10171200K), [CMS Perm : 181711K->169514K(524288K)], 
> 3.7692999 secs] [Times: user=3.76 sys=0.00, real=3.79 secs]
>
> 2013-12-20T06:49:59.249+0100: 771795.415: [GC 771795.415: [ParNew: 
> 1585077K->72146K(2831168K), 0.2632945 secs] 
> 2042868K->529936K(10171200K), 0.2639889 secs] [Times: user=0.41 
> sys=0.00, real=0.27 secs]
>
> N4
>
> 2013-12-20T05:48:21.551+0100: 767914.067: [GC 767914.068: [ParNew: 
> 2656327K->119432K(2831168K), 0.2603676 secs] 
> 3222693K->685799K(10171200K), 0.2609581 secs] [Times: user=1.14 
> sys=0.00, real=0.26 secs]
>
> 2013-12-20T05:49:03.939+0100: 767956.457: [Full GC 767956.457: [CMS: 
> 566366K->579681K(7340032K), 6.1324841 secs] 
> 3149011K->579681K(10171200K), [CMS Perm : 190240K->174791K(524288K)], 
> 6.1331389 secs] [Times: user=6.13 sys=0.00, real=6.13 secs]
>
> 2013-12-20T05:50:10.262+0100: 768022.762: [GC 768022.763: [ParNew: 
> 2516608K->83922K(2831168K), 0.2157015 secs] 
> 3096289K->663603K(10171200K), 0.2162262 secs] [Times: user=0.41 
> sys=0.00, real=0.22 secs]
>
> Between these Full GC are only a few minutes or as between N3 and N4 a 
> few seconds. I made some research on possible reasons for a Full GC 
> and this is the list I gathered so far:
>
> 1.Running out of old gen
>
> 2.Running out of perm gen
>
> 3.Calling System.gc() (indicated by System in the ouput)
>
> 4.Not having enough free space in Survivor Space to copy objects from 
> Eden (promotion failed)
>
> 5.Running out of old gen before a concurrent collection can free it 
> (Concurrent Mode Failure)
>
> 6.Having high fragmentation and not enough space for a larger object 
> in old gen
>
> However none of these 6 conditions are fulfilled by any of the above 
> shown full GC. So once again I'm lost and do not have an explanation 
> for this.
>
> If you need the full logs for further analysis please let me know.
>
> Kind Regards
>
> Denny
>
> Beschreibung: werum-hr
>
> cid:image002.png at 01CEF67E.8AEB4630
>
> Werum Software & Systems AG
>
> Wulf-Werum-Strasse 3 | 21337 Lueneburg | Germany
>
> Tel. +49(0)4131/8900-983 | Fax +49(0)4131/8900-20
>
> mailto:denny.kettwig at werum.de | http://www.werum.de <http://www.werum.de/>
>
> VAT No. DE 116 083 850 | RG Lueneburg HRB 2262
>
> Chairman of Supervisory Board: Johannes Zimmermann
>
> Executive Board: Hartmut Krome, Ruediger Schlierenkaemper, Hans-Peter 
> Subel
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/dcdc9453/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 1089 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/dcdc9453/attachment-0001.jpe 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6441 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/dcdc9453/attachment-0001.png 

From denny.kettwig at werum.de  Thu Jan 23 04:43:56 2014
From: denny.kettwig at werum.de (Denny Kettwig)
Date: Thu, 23 Jan 2014 12:43:56 +0000
Subject: Unexplanable events in GC logs
Message-ID: <6175F8C4FE407D4F830EDA25C27A43173B662980@Werum1790.werum.net>

Thank you Matt!

An option I never considered, this is very likely the case. Any ideas for the CMS issue?

-Denny

Von: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] Im Auftrag von Matt Miller
Gesendet: Thursday, January 23, 2014 1:02 PM
An: hotspot-gc-use at openjdk.java.net
Betreff: Re: AW: Unexplanable events in GC logs

Hi Denny,

Another reason for a Full GC that you did not list is: running jmap against the process.

If you run a histo:live or take a live object heap dump, a Full GC will happen (and it doesn't show as a System GC).
Is it possible that you are taking a heap histogram or heap dump using the live option?

-Matt

On 1/23/14, 3:38 AM, Denny Kettwig wrote:
Hey folks,

in one of our recent cluster systems we found 2 unexplainable events within the GC logs. I'd like to address these events to you and ask for your advice. We are running a clustered jBoss System with 4 nodes, every node has the same configuration. We use the following parameters:

-Xms10g
-Xmx10g
-Xmn3g
-Xss2048k
-XX:+ExplicitGCInvokesConcurrent
-XX:+CMSClassUnloadingEnabled
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:ParallelGCThreads=22
-XX:SurvivorRatio=8
-XX:TargetSurvivorRatio=90
-XX:PermSize=512M
-XX:MaxPermSize=512m
-Dsun.rmi.dgc.client.gcInterval=3600000
-Dsun.rmi.dgc.server.gcInterval=3600000

Event 1

The first event is a very frequent CMS collections on the first node for about 11 hours. We are talking here about a peak value of 258 CMS collections per hour. From my knowledge this event starts without any reason and ends without any reason since the old space is below 10% usage in this time frame. I do not know what might have caused this CMS collection. We already experienced similar events on this cluster in the past, but at a much higher heap usage (above 90%) and under high load with most likely a huge heap fragmentation and the frequent CMS collections ended with a single very long Full GC that defragmented the heap. In the current event all this is not the case. I attached the relevant part of the GC log.

Event 2

The second event is just as confusing for me as the first one. At a certain point in time a full GC takes place on all 4 Nodes and I cannot find a reason for it. Here are the relevant parts:


N1

2013-12-20T05:43:28.041+0100: 768231.382: [GC 768231.383: [ParNew: 2618493K->92197K(2831168K), 0.3007160 secs] 3120456K->594160K(10171200K), 0.3017204 secs] [Times: user=0.91 sys=0.00, real=0.30 secs]

2013-12-20T05:45:31.864+0100: 768355.209: [Full GC 768355.210: [CMS: 501963K->496288K(7340032K), 4.2140085 secs] 1397267K->496288K(10171200K), [CMS Perm : 203781K->178528K(524288K)], 4.2148018 secs] [Times: user=4.21 sys=0.00, real=4.21 secs]

2013-12-20T05:52:40.591+0100: 768783.949: [GC 768783.949: [ParNew: 2516608K->48243K(2831168K), 0.2649174 secs] 3012896K->544532K(10171200K), 0.2659039 secs] [Times: user=0.47 sys=0.00, real=0.27 secs]


N2

2013-12-20T04:57:21.524+0100: 765208.310: [GC 765208.311: [ParNew: 924566K->111068K(2831168K), 0.1790573 secs] 1416514K->603015K(10171200K), 0.1797121 secs] [Times: user=1.08 sys=0.00, real=0.19 secs]

2013-12-20T04:57:21.711+0100: 765208.499: [GC [1 CMS-initial-mark: 491947K(7340032K)] 603015K(10171200K), 0.3289639 secs] [Times: user=0.33 sys=0.00, real=0.33 secs]

2013-12-20T04:57:22.039+0100: 765208.828: [CMS-concurrent-mark-start]

2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-mark: 0.577/0.577 secs] [Times: user=3.53 sys=0.05, real=0.58 secs]

2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-preclean-start]

2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-preclean: 0.024/0.024 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]

2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-abortable-preclean-start]

CMS: abort preclean due to time 2013-12-20T04:57:28.060+0100: 765214.848: [CMS-concurrent-abortable-preclean: 4.208/5.418 secs] [Times: user=4.10 sys=0.03, real=5.41 secs]

2013-12-20T04:57:28.076+0100: 765214.857: [GC[YG occupancy: 124872 K (2831168 K)]765214.857: [Rescan (parallel) , 0.0580906 secs]765214.916: [weak refs processing, 0.0000912 secs]765214.916: [class unloading, 0.0769742 secs]765214.993: [scrub symbol & string tables, 0.0612689 secs] [1 CMS-remark: 491947K(7340032K)] 616820K(10171200K), 0.2256506 secs] [Times: user=0.48 sys=0.00, real=0.22 secs]

2013-12-20T04:57:28.294+0100: 765215.083: [CMS-concurrent-sweep-start]

2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-sweep: 0.750/0.750 secs] [Times: user=0.76 sys=0.00, real=0.75 secs]

2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-reset-start]

2013-12-20T04:57:29.074+0100: 765215.856: [CMS-concurrent-reset: 0.022/0.022 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]

2013-12-20T05:47:54.607+0100: 768241.413: [Full GC 768241.414: [CMS: 491947K->464864K(7340032K), 4.5053183 secs] 1910412K->464864K(10171200K), [CMS Perm : 183511K->168997K(524288K)], 4.5059088 secs] [Times: user=4.49 sys=0.02, real=4.51 secs]

2013-12-20T06:47:59.098+0100: 771845.954: [GC 771845.954: [ParNew: 1507774K->58195K(2831168K), 0.2307012 secs] 1972638K->523059K(10171200K), 0.2313931 secs] [Times: user=0.37 sys=0.00, real=0.23 secs]


N3

2013-12-20T05:46:25.441+0100: 767981.526: [GC 767981.526: [ParNew: 2695212K->166641K(2831168K), 0.3278475 secs] 3268057K->739486K(10171200K), 0.3284853 secs] [Times: user=1.62 sys=0.00, real=0.33 secs]

2013-12-20T05:49:55.467+0100: 768191.578: [Full GC 768191.578: [CMS: 572844K->457790K(7340032K), 3.7687762 secs] 1216176K->457790K(10171200K), [CMS Perm : 181711K->169514K(524288K)], 3.7692999 secs] [Times: user=3.76 sys=0.00, real=3.79 secs]

2013-12-20T06:49:59.249+0100: 771795.415: [GC 771795.415: [ParNew: 1585077K->72146K(2831168K), 0.2632945 secs] 2042868K->529936K(10171200K), 0.2639889 secs] [Times: user=0.41 sys=0.00, real=0.27 secs]


N4

2013-12-20T05:48:21.551+0100: 767914.067: [GC 767914.068: [ParNew: 2656327K->119432K(2831168K), 0.2603676 secs] 3222693K->685799K(10171200K), 0.2609581 secs] [Times: user=1.14 sys=0.00, real=0.26 secs]

2013-12-20T05:49:03.939+0100: 767956.457: [Full GC 767956.457: [CMS: 566366K->579681K(7340032K), 6.1324841 secs] 3149011K->579681K(10171200K), [CMS Perm : 190240K->174791K(524288K)], 6.1331389 secs] [Times: user=6.13 sys=0.00, real=6.13 secs]

2013-12-20T05:50:10.262+0100: 768022.762: [GC 768022.763: [ParNew: 2516608K->83922K(2831168K), 0.2157015 secs] 3096289K->663603K(10171200K), 0.2162262 secs] [Times: user=0.41 sys=0.00, real=0.22 secs]


Between these Full GC are only a few minutes or as between N3 and N4 a few seconds. I made some research on possible reasons for a Full GC and this is the list I gathered so far:

1.       Running out of old gen

2.       Running out of perm gen

3.       Calling System.gc() (indicated by System in the ouput)

4.       Not having enough free space in Survivor Space to copy objects from Eden (promotion failed)

5.       Running out of old gen before a concurrent collection can free it (Concurrent Mode Failure)

6.       Having high fragmentation and not enough space for a larger object in old gen


However none of these 6 conditions are fulfilled by any of the above shown full GC. So once again I'm lost and do not have an explanation for this.
If you need the full logs for further analysis please let me know.

Kind Regards
Denny


[cid:image002.png at 01CEF67E.8AEB4630]

Werum Software & Systems AG
Wulf-Werum-Strasse 3 | 21337 Lueneburg | Germany
Tel. +49(0)4131/8900-983 | Fax +49(0)4131/8900-20
mailto:denny.kettwig at werum.de | http://www.werum.de<http://www.werum.de/>

VAT No. DE 116 083 850 | RG Lueneburg HRB 2262
Chairman of Supervisory Board: Johannes Zimmermann
Executive Board: Hartmut Krome, Ruediger Schlierenkaemper, Hans-Peter Subel


_______________________________________________

hotspot-gc-use mailing list

hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>

http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/128be8d5/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1089 bytes
Desc: image001.jpg
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/128be8d5/image001.jpg 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 6441 bytes
Desc: image002.png
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/128be8d5/image002.png 

From holger.hoffstaette at googlemail.com  Thu Jan 23 05:22:42 2014
From: holger.hoffstaette at googlemail.com (=?UTF-8?B?SG9sZ2VyIEhvZmZzdMOkdHRl?=)
Date: Thu, 23 Jan 2014 14:22:42 +0100
Subject: Unexplanable events in GC logs
In-Reply-To: <6175F8C4FE407D4F830EDA25C27A43173B662980@Werum1790.werum.net>
References: <6175F8C4FE407D4F830EDA25C27A43173B662980@Werum1790.werum.net>
Message-ID: <52E117A2.7070405@googlemail.com>

On 01/23/14 13:43, Denny Kettwig wrote:
> An option I never considered, this is very likely the case. Any ideas
> for the CMS issue?

You do know that RMI periodically calls System.gc(), right? CMS will
normally ignore this, but since you have enabled
ExplicitGCInvokesConcurrent it might (probably will) mess with any
estimations done by CMS. Have you tried without it?

-h


From denny.kettwig at werum.de  Thu Jan 23 06:10:47 2014
From: denny.kettwig at werum.de (Denny Kettwig)
Date: Thu, 23 Jan 2014 14:10:47 +0000
Subject: AW: Unexplanable events in GC logs
In-Reply-To: <52E117A2.7070405@googlemail.com>
References: <6175F8C4FE407D4F830EDA25C27A43173B662980@Werum1790.werum.net>
	<52E117A2.7070405@googlemail.com>
Message-ID: <6175F8C4FE407D4F830EDA25C27A43173B6629DA@Werum1790.werum.net>

RMI call interval is set by:
-Dsun.rmi.dgc.client.gcInterval=3600000 
-Dsun.rmi.dgc.server.gcInterval=3600000

And except for this particular part of the log the CMS collection occurs once per hour.

--------------------------------------------
Von: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] Im Auftrag von Holger Hoffst?tte
Gesendet: Thursday, January 23, 2014 2:25 PM
An: hotspot-gc-use at openjdk.java.net
Betreff: Re: Unexplanable events in GC logs

On 01/23/14 13:43, Denny Kettwig wrote:
> An option I never considered, this is very likely the case. Any ideas 
> for the CMS issue?

You do know that RMI periodically calls System.gc(), right? CMS will normally ignore this, but since you have enabled ExplicitGCInvokesConcurrent it might (probably will) mess with any estimations done by CMS. Have you tried without it?

-h

_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From bernd-2014 at eckenfels.net  Thu Jan 23 15:49:38 2014
From: bernd-2014 at eckenfels.net (Bernd Eckenfels)
Date: Fri, 24 Jan 2014 00:49:38 +0100
Subject: AW: Unexplanable events in GC logs
In-Reply-To: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net>
References: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net>
Message-ID: <op.w95y2bob0xp9um@eckenfels02.seeburger.de>

Am 23.01.2014, 09:38 Uhr, schrieb Denny Kettwig <denny.kettwig at werum.de>:
> The first event is a very frequent CMS collections on the first node for  
> about 11 hours. We are talking here about a peak value of 258 CMS  
> collections per hour. From my knowledge this event starts without any  
> reason and ends without any reason since the old space is below 10%  
> usage in this time frame. I do not know what might have caused this CMS  
> collection. We already experienced similar events on this cluster in the  
> past, but at a much higher heap usage (above 90%) and under high load  
> with most likely a huge heap fragmentation and the frequent CMS  
> collections ended with a single very long Full GC that defragmented the  
> heap. In the current event all this is not the case. I attached the  
> relevant part of the GC log.

I had discussed similiar problems hare in the past as well. I havent  
really found the reason, but some things consiedered have been filling up  
of the codecache and a hossed estimator (using OccupancyOnly might help  
here).

Greetings
Bernd

From jon.masamitsu at oracle.com  Fri Jan 24 13:20:20 2014
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Fri, 24 Jan 2014 13:20:20 -0800
Subject: AW: Unexplanable events in GC logs
In-Reply-To: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net>
References: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net>
Message-ID: <52E2D914.4010406@oracle.com>

Denny,

Does you application use JNI critical regions?
(GetPrimitiveArrayCritical or GetStringCritical)

What jdk release is this?

Jon

On 1/23/2014 12:38 AM, Denny Kettwig wrote:
>
> Hey folks,
>
> in one of our recent cluster systems we found 2 unexplainable events 
> within the GC logs. I'd like to address these events to you and ask 
> for your advice. We are running a clustered jBoss System with 4 nodes, 
> every node has the same configuration. We use the following parameters:
>
> -Xms10g
>
> -Xmx10g
>
> -Xmn3g
>
> -Xss2048k
>
> -XX:+ExplicitGCInvokesConcurrent
>
> -XX:+CMSClassUnloadingEnabled
>
> -XX:+UseParNewGC
>
> -XX:+UseConcMarkSweepGC
>
> -XX:ParallelGCThreads=22
>
> -XX:SurvivorRatio=8
>
> -XX:TargetSurvivorRatio=90
>
> -XX:PermSize=512M
>
> -XX:MaxPermSize=512m
>
> -Dsun.rmi.dgc.client.gcInterval=3600000
>
> -Dsun.rmi.dgc.server.gcInterval=3600000
>
> *Event 1*
>
> The first event is a very frequent CMS collections on the first node 
> for about 11 hours. We are talking here about a peak value of 258 CMS 
> collections per hour. From my knowledge this event starts without any 
> reason and ends without any reason since the old space is below 10% 
> usage in this time frame. I do not know what might have caused this 
> CMS collection. We already experienced similar events on this cluster 
> in the past, but at a much higher heap usage (above 90%) and under 
> high load with most likely a huge heap fragmentation and the frequent 
> CMS collections ended with a single very long Full GC that 
> defragmented the heap. In the current event all this is not the case. 
> *I attached the relevant part of the GC log*.
>
> *Event 2*
>
> The second event is just as confusing for me as the first one. At a 
> certain point in time a full GC takes place on *all 4 Nodes *and I 
> cannot find a reason for it. Here are the relevant parts:
>
> N1
>
> 2013-12-20T05:43:28.041+0100: 768231.382: [GC 768231.383: [ParNew: 
> 2618493K->92197K(2831168K), 0.3007160 secs] 
> 3120456K->594160K(10171200K), 0.3017204 secs] [Times: user=0.91 
> sys=0.00, real=0.30 secs]
>
> 2013-12-20T05:45:31.864+0100: 768355.209: [Full GC 768355.210: [CMS: 
> 501963K->496288K(7340032K), 4.2140085 secs] 
> 1397267K->496288K(10171200K), [CMS Perm : 203781K->178528K(524288K)], 
> 4.2148018 secs] [Times: user=4.21 sys=0.00, real=4.21 secs]
>
> 2013-12-20T05:52:40.591+0100: 768783.949: [GC 768783.949: [ParNew: 
> 2516608K->48243K(2831168K), 0.2649174 secs] 
> 3012896K->544532K(10171200K), 0.2659039 secs] [Times: user=0.47 
> sys=0.00, real=0.27 secs]
>
> N2
>
> 2013-12-20T04:57:21.524+0100: 765208.310: [GC 765208.311: [ParNew: 
> 924566K->111068K(2831168K), 0.1790573 secs] 
> 1416514K->603015K(10171200K), 0.1797121 secs] [Times: user=1.08 
> sys=0.00, real=0.19 secs]
>
> 2013-12-20T04:57:21.711+0100: 765208.499: [GC [1 CMS-initial-mark: 
> 491947K(7340032K)] 603015K(10171200K), 0.3289639 secs] [Times: 
> user=0.33 sys=0.00, real=0.33 secs]
>
> 2013-12-20T04:57:22.039+0100: 765208.828: [CMS-concurrent-mark-start]
>
> 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-mark: 
> 0.577/0.577 secs] [Times: user=3.53 sys=0.05, real=0.58 secs]
>
> 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-preclean-start]
>
> 2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-preclean: 
> 0.024/0.024 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
>
> 2013-12-20T04:57:22.647+0100: 765209.430: 
> [CMS-concurrent-abortable-preclean-start]
>
> CMS: abort preclean due to time 2013-12-20T04:57:28.060+0100: 
> 765214.848: [CMS-concurrent-abortable-preclean: 4.208/5.418 secs] 
> [Times: user=4.10 sys=0.03, real=5.41 secs]
>
> 2013-12-20T04:57:28.076+0100: 765214.857: [GC[YG occupancy: 124872 K 
> (2831168 K)]765214.857: [Rescan (parallel) , 0.0580906 
> secs]765214.916: [weak refs processing, 0.0000912 secs]765214.916: 
> [class unloading, 0.0769742 secs]765214.993: [scrub symbol & string 
> tables, 0.0612689 secs] [1 CMS-remark: 491947K(7340032K)] 
> 616820K(10171200K), 0.2256506 secs] [Times: user=0.48 sys=0.00, 
> real=0.22 secs]
>
> 2013-12-20T04:57:28.294+0100: 765215.083: [CMS-concurrent-sweep-start]
>
> 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-sweep: 
> 0.750/0.750 secs] [Times: user=0.76 sys=0.00, real=0.75 secs]
>
> 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-reset-start]
>
> 2013-12-20T04:57:29.074+0100: 765215.856: [CMS-concurrent-reset: 
> 0.022/0.022 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
>
> 2013-12-20T05:47:54.607+0100: 768241.413: [Full GC 768241.414: [CMS: 
> 491947K->464864K(7340032K), 4.5053183 secs] 
> 1910412K->464864K(10171200K), [CMS Perm : 183511K->168997K(524288K)], 
> 4.5059088 secs] [Times: user=4.49 sys=0.02, real=4.51 secs]
>
> 2013-12-20T06:47:59.098+0100: 771845.954: [GC 771845.954: [ParNew: 
> 1507774K->58195K(2831168K), 0.2307012 secs] 
> 1972638K->523059K(10171200K), 0.2313931 secs] [Times: user=0.37 
> sys=0.00, real=0.23 secs]
>
> N3
>
> 2013-12-20T05:46:25.441+0100: 767981.526: [GC 767981.526: [ParNew: 
> 2695212K->166641K(2831168K), 0.3278475 secs] 
> 3268057K->739486K(10171200K), 0.3284853 secs] [Times: user=1.62 
> sys=0.00, real=0.33 secs]
>
> 2013-12-20T05:49:55.467+0100: 768191.578: [Full GC 768191.578: [CMS: 
> 572844K->457790K(7340032K), 3.7687762 secs] 
> 1216176K->457790K(10171200K), [CMS Perm : 181711K->169514K(524288K)], 
> 3.7692999 secs] [Times: user=3.76 sys=0.00, real=3.79 secs]
>
> 2013-12-20T06:49:59.249+0100: 771795.415: [GC 771795.415: [ParNew: 
> 1585077K->72146K(2831168K), 0.2632945 secs] 
> 2042868K->529936K(10171200K), 0.2639889 secs] [Times: user=0.41 
> sys=0.00, real=0.27 secs]
>
> N4
>
> 2013-12-20T05:48:21.551+0100: 767914.067: [GC 767914.068: [ParNew: 
> 2656327K->119432K(2831168K), 0.2603676 secs] 
> 3222693K->685799K(10171200K), 0.2609581 secs] [Times: user=1.14 
> sys=0.00, real=0.26 secs]
>
> 2013-12-20T05:49:03.939+0100: 767956.457: [Full GC 767956.457: [CMS: 
> 566366K->579681K(7340032K), 6.1324841 secs] 
> 3149011K->579681K(10171200K), [CMS Perm : 190240K->174791K(524288K)], 
> 6.1331389 secs] [Times: user=6.13 sys=0.00, real=6.13 secs]
>
> 2013-12-20T05:50:10.262+0100: 768022.762: [GC 768022.763: [ParNew: 
> 2516608K->83922K(2831168K), 0.2157015 secs] 
> 3096289K->663603K(10171200K), 0.2162262 secs] [Times: user=0.41 
> sys=0.00, real=0.22 secs]
>
> Between these Full GC are only a few minutes or as between N3 and N4 a 
> few seconds. I made some research on possible reasons for a Full GC 
> and this is the list I gathered so far:
>
> 1.Running out of old gen
>
> 2.Running out of perm gen
>
> 3.Calling System.gc() (indicated by System in the ouput)
>
> 4.Not having enough free space in Survivor Space to copy objects from 
> Eden (promotion failed)
>
> 5.Running out of old gen before a concurrent collection can free it 
> (Concurrent Mode Failure)
>
> 6.Having high fragmentation and not enough space for a larger object 
> in old gen
>
> However none of these 6 conditions are fulfilled by any of the above 
> shown full GC. So once again I'm lost and do not have an explanation 
> for this.
>
> If you need the full logs for further analysis please let me know.
>
> Kind Regards
>
> Denny
>
> Beschreibung: werum-hr
>
> cid:image002.png at 01CEF67E.8AEB4630
>
> Werum Software & Systems AG
>
> Wulf-Werum-Strasse 3 | 21337 Lueneburg | Germany
>
> Tel. +49(0)4131/8900-983 | Fax +49(0)4131/8900-20
>
> mailto:denny.kettwig at werum.de | http://www.werum.de <http://www.werum.de/>
>
> VAT No. DE 116 083 850 | RG Lueneburg HRB 2262
>
> Chairman of Supervisory Board: Johannes Zimmermann
>
> Executive Board: Hartmut Krome, Ruediger Schlierenkaemper, Hans-Peter 
> Subel
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140124/e29d81a2/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 1089 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140124/e29d81a2/attachment.jpe 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6441 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140124/e29d81a2/attachment.png 

From bphinz at users.sourceforge.net  Sat Jan 25 09:03:25 2014
From: bphinz at users.sourceforge.net (Brian Hinz)
Date: Sat, 25 Jan 2014 12:03:25 -0500
Subject: Why does Clipboard.getData() use huge amount of heap memory?
Message-ID: <CAJ5DRCCvpOYJ19UTB8Jq_FDGrCSV0tKOGWwezWTi27QpdTzpsw@mail.gmail.com>

Hi,

Apologies in advance if this is not the right place, but I maintain an open
source java VNC viewer (TigerVNC) and I'm pretty much stumped over an OOM
exception that gets thrown when I try to access the system clipboard and it
contains a large amount of text data.  It seems that the heap size jumps
more than 10x the actual size of the clipboard data.  For example, if I
select the whole contents of a 20Mb text file and copy it all to the
clipboard (outside the java app) then try to access the clipboard from my
app while monitoring the heap size using jconsole, I see the heap size jump
by 200-400Mb.  I've isolated the source of the exception to the call to
Clipboard.getData() (see code below).  I've tried using different
DataFlavors, etc., but all have the same result.  Is this just an
inefficiency in the implementation of getData that I'll have to live with?
 Any suggestions?

TIA,
-brian

  public synchronized void checkClipboard() {
    SecurityManager sm = System.getSecurityManager();
    try {
      if (sm != null) sm.checkSystemClipboardAccess();
      Clipboard cb = Toolkit.getDefaultToolkit().getSystemClipboard();
      DataFlavor flavor = DataFlavor.stringFlavor;
      if (cb != null && cb.isDataFlavorAvailable(flavor)) {
        StringReader reader = null;
        try {
          reader = new StringReader((String)cb.getData(flavor));
          reader.read(clipBuf);
        } catch(java.lang.OutOfMemoryError e) {
          vlog.error("Too much data on local clipboard for VncViewer to
handle!");
        } finally {
          if (reader != null) reader.close();
        }
        clipBuf.flip();
        String newContents = clipBuf.toString();
        if (!cc.clipboardDialog.compareContentsTo(newContents)) {
          cc.clipboardDialog.setContents(newContents);
          if (cc.viewer.sendClipboard.getValue())
            cc.writeClientCutText(newContents, newContents.length());
        }
        clipBuf.clear();
        // clear out the heap memory used by cb.getData() or else it starts
to accumulate
        System.gc();
      }
    } catch(java.lang.Exception e) {
      vlog.debug("Exception getting clipboard data: " + e.getMessage());
    }
  }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140125/a10371e5/attachment-0001.html 

From bernd-2014 at eckenfels.net  Sat Jan 25 11:35:18 2014
From: bernd-2014 at eckenfels.net (Bernd Eckenfels)
Date: Sat, 25 Jan 2014 20:35:18 +0100
Subject: Why does Clipboard.getData() use huge amount of heap memory?
In-Reply-To: <CAJ5DRCCvpOYJ19UTB8Jq_FDGrCSV0tKOGWwezWTi27QpdTzpsw@mail.gmail.com>
References: <CAJ5DRCCvpOYJ19UTB8Jq_FDGrCSV0tKOGWwezWTi27QpdTzpsw@mail.gmail.com>
Message-ID: <1A94445B-6814-4E46-A474-2932755311DE@eckenfels.net>

Not sure about the clipboard API, but with your reading from a StringReader you essentially double the amount of Space used. (And UTF16 Chars also doubeld the byte count for single byte text). Did you try to get a byte[] Array instead? You can place a ByteBuffer on top of it with no additional copy.

Bernd

> Am 25.01.2014 um 18:03 schrieb Brian Hinz <bphinz at users.sourceforge.net>:
> 
> Hi,
> 
> Apologies in advance if this is not the right place, but I maintain an open source java VNC viewer (TigerVNC) and I'm pretty much stumped over an OOM exception that gets thrown when I try to access the system clipboard and it contains a large amount of text data.  It seems that the heap size jumps more than 10x the actual size of the clipboard data.  For example, if I select the whole contents of a 20Mb text file and copy it all to the clipboard (outside the java app) then try to access the clipboard from my app while monitoring the heap size using jconsole, I see the heap size jump by 200-400Mb.  I've isolated the source of the exception to the call to Clipboard.getData() (see code below).  I've tried using different DataFlavors, etc., but all have the same result.  Is this just an inefficiency in the implementation of getData that I'll have to live with?  Any suggestions?
> 
> TIA,
> -brian
> 
>   public synchronized void checkClipboard() {
>     SecurityManager sm = System.getSecurityManager();
>     try {
>       if (sm != null) sm.checkSystemClipboardAccess();
>       Clipboard cb = Toolkit.getDefaultToolkit().getSystemClipboard();
>       DataFlavor flavor = DataFlavor.stringFlavor;
>       if (cb != null && cb.isDataFlavorAvailable(flavor)) {
>         StringReader reader = null;
>         try {
>           reader = new StringReader((String)cb.getData(flavor));
>           reader.read(clipBuf);
>         } catch(java.lang.OutOfMemoryError e) {
>           vlog.error("Too much data on local clipboard for VncViewer to handle!");
>         } finally {
>           if (reader != null) reader.close();
>         }
>         clipBuf.flip();
>         String newContents = clipBuf.toString();
>         if (!cc.clipboardDialog.compareContentsTo(newContents)) {
>           cc.clipboardDialog.setContents(newContents);
>           if (cc.viewer.sendClipboard.getValue())
>             cc.writeClientCutText(newContents, newContents.length());
>         }
>         clipBuf.clear();
>         // clear out the heap memory used by cb.getData() or else it starts to accumulate
>         System.gc();
>       }
>     } catch(java.lang.Exception e) {
>       vlog.debug("Exception getting clipboard data: " + e.getMessage());
>     }
>   }
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140125/460d8bc3/attachment.html 

From denny.kettwig at werum.de  Mon Jan 27 01:50:44 2014
From: denny.kettwig at werum.de (Denny Kettwig)
Date: Mon, 27 Jan 2014 09:50:44 +0000
Subject: AW: AW: Unexplanable events in GC logs
In-Reply-To: <52E2D914.4010406@oracle.com>
References: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net>
	<52E2D914.4010406@oracle.com>
Message-ID: <6175F8C4FE407D4F830EDA25C27A43173B662E80@Werum1790.werum.net>

Hey Jon,

> Does you application use JNI critical regions?
No.

> What jdk release is this?
JDK 1.6 u23

Regards,
Denny

Von: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] Im Auftrag von Jon Masamitsu
Gesendet: Friday, January 24, 2014 10:24 PM
An: hotspot-gc-use at openjdk.java.net
Betreff: Re: AW: Unexplanable events in GC logs

Denny,

Does you application use JNI critical regions?
(GetPrimitiveArrayCritical or GetStringCritical)

What jdk release is this?

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140127/366e4d45/attachment.html