Minor GC difference Java 7 vs Java 8

Tue Jun 3 17:37:24 UTC 2014

On 06/03/2014 06:51 AM, Chris Hurst wrote:
> Hi,
>
>  Reducing parallel threads on the simple testbed sample code actually 
> reduced minor GC STW's (presumably in this scenario multithreading is 
> inefficient due to the simplicity of the task), however the reverse 
> was true for the real application though for a signal thread reduction 
> the difference is hard to measure sub ms if at all. I'd expect the 
> difference to be because the real application will have a more complex 
> object graph and be tenuring with a very small amount of promotion to 
> old. If a spike does occur with parallel gc threads reduced ie 
> increasing other process activity on the box the duration of the spike 
> appears unaffected by the reduction in parallel GC threads which I 
> guess I would expect.
>
> Any suggestions as to what value to use for 
> WorkStealingYieldsBeforeSleep, otherwise I'll just trying doubling it 
> ? If you have any additional notes , details on this change that would 
> be most helpful.

WorkStealingYieldsBeforeSleep is one of those flags that
just depends on your needs so I don't have any good
suggestion.

Jon

>
> Chris
>
>   PS This particular application uses the following (we regularly test 
> other GC permutations including G1 / CMS, our goal is lower latency) ..
>
>  -Xms1536m
> -Xmx1536m
> -XX:+PrintCommandLineFlags
> -XX:+UseParallelOldGC
> -XX:+UnlockDiagnosticVMOptions
> -XX:PermSize=50M
> -XX:-UseAdaptiveSizePolicy
> -XX:+AlwaysPreTouch
> -XX:MaxTenuringThreshold=15
> -XX:InitialTenuringThreshold=15
> -XX:+DTraceMonitorProbes
> -XX:+ExtendedDTraceProbes
> -Dsun.rmi.dgc.client.gcInterval=604800000
> -Dsun.rmi.dgc.server.gcInterval=604800000
>
> ------------------------------------------------------------------------
> Date: Mon, 2 Jun 2014 15:52:14 -0700
> From: jon.masamitsu at oracle.com
> To: christopherhurst at hotmail.com
> CC: hotspot-gc-use at openjdk.java.net
> Subject: Re: Minor GC difference Java 7 vs Java 8
>
>
> On 06/01/2014 05:14 PM, Chris Hurst wrote:
>
>     Hi,
>
>     Thanks for the replies, I've been stuck on this issue for about a
>     year and had raised it with Oracle support but hadn't got anywhere
>     but last weekend I managed to get a lot further with it ..
>
>     I wrote a trivial program to continuously fill young gen and
>     release the garbage for use with some DTrace tests and this showed
>     a similar issue spikes wise from there I could work out the issue
>     as the parallel GC threads, i.e. anything less than number of
>     cores removed the spike (ie cores-1), for the test program normal
>     young GC oscillated about 1ms but spiked at about 15ms(need to
>     check) (Reducing the parallel threads worked on the real
>     application in a similar way).
>     We managed to identify some very minor tasks (they were so small
>     they weren't showing up on some of our less fine grained CPU
>     monitoring) that occasionally competed for CPU, the effect was
>     surprising relatively but now we understand the cause we can tune
>     the Java 6 GC better.
>     The spikes were again larger than I would have expected and all
>     appear to be every close in size, I wouldn't have predicted this
>     from the issue but that's fine ;-)
>
>     Currently the Java 7 version is still not quite as good on overall
>     throughput when not spiking though I will recheck these results,
>     as our most recent tests were around tuning J6 with the new info.
>     We're happy with our Java 6 GC performance.
>
>     Although we can now reduce these already rare spikes (potentially
>     to zero), I can't 100% guarantee they won't occur so I would still
>     like to understand why Java 7 appears to handle this scenario less
>     efficiently.
>
>     Using Dtrace we were mostly seeing yields and looking at a stack
>     trace pointed us toward some JDK 7 changes and some newer java
>     options that might be related ?? ...
>
>     taskqueue.cpp
>
>     libc.so.1`lwp_yield+0x15 libjvm.so`__1cCosFyield6F_v_+0x257
>     libjvm.so`__1cWParallelTaskTerminatorFyield6M_v_+0x18
>     libjvm.so`__1cWParallelTaskTerminatorRoffer_termination6MpnUTerminatorTerminator__b_+0xe8
>     libjvm.so`__1cJStealTaskFdo_it6MpnNGCTaskManager_I_v_+0x378
>     libjvm.so`__1cMGCTaskThreadDrun6M_v_+0x19f
>     libjvm.so`java_start+0x1f2 libc.so.1`_thr_setup+0x4e
>     libc.so.1`_lwp_start  17:29
>
>     uintx WorkStealingHardSpins                     = 4096           
>     {experimental}
>     uintx WorkStealingSleepMillis                   = 1              
>     {experimental}
>     uintx WorkStealingSpinToYieldRatio              = 10             
>     {experimental}
>     uintx WorkStealingYieldsBeforeSleep             = 5000           
>     {experimental}
>
>     I haven't had a chance to play with these as yet but could these
>     be involved eg j7 tuned to be more friendly to other applications
>     at the cost of latency (spin to yield) ? Would that make sense ?
>
>
> Chris,
>
> My best recollection is that there was a performance regression
> reported internally and the change to 5000 was to fix
> that regression.   Increasing the number of yield's done before
> a sleep made this code work more like the previous behavior.
> Let me know if you need better information  and I can see what
> I can dig up.
>
> By the way,  when you tuned down the number of ParallelGCThreads,
> you saw little or no increase in the STW pause times?
>
> You're using UseParallelGC?
>
> Jon
>
>     We would like to move to Java 7 for support reasons, also as we
>     are on Solaris the extra memory over head of J8 (64bit only) even
>     with compressed oops gives us another latency hit.
>
>     Chris
>
>     PS -XX:+AlwaysPreTouch is on.
>
>     > Date: Thu, 29 May 2014 13:47:39 -0700
>     > From: Peter.B.Kessler at Oracle.COM <mailto:Peter.B.Kessler at Oracle.COM>
>     > To: christopherhurst at hotmail.com
>     <mailto:christopherhurst at hotmail.com>;
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     > Subject: Re: Minor GC difference Java 7 vs Java 8
>     >
>     > Are the -XX:+PrintGCDetails "[Times: user=0.01 sys=0.00,
>     real=0.03 secs]" reports for the long pauses different from the
>     short pauses? I'm hoping for some anomalous sys time, or user/real
>     ratio, that would indicate it was something happening on the
>     machine that is interfering with the collector. But you'd think
>     that would show up as occasional 15ms blips in your message
>     processing latency outside of when the collector goes off.
>     >
>     > Does -XX:+PrintHeapAtGC show anything anomalous about the space
>     occupancy after the long pauses? E.g., more objects getting copied
>     to the survivor space, or promoted to the old generation? You
>     could infer the numbers from -XX:+PrintGCDetails output if you
>     didn't want to deal with the volume produced by -XX:+PrintHeapAtGC.
>     >
>     > You don't say how large or how stable your old generation size
>     is. If you have to get new pages from the OS to expand the old
>     generation, or give pages back to the OS because the old
>     generation can shrink, that's extra work. You can infer this
>     traffic from -XX:+PrintHeapAtGC output by looking at the
>     "committed" values for the generations. E.g., in "ParOldGen total
>     43008K, used 226K [0xba400000, 0xbce00000, 0xe4e00000)" those
>     three hex numbers are the start address for the generation, the
>     end of the committed memory for that generation, and the end of
>     the reserved memory for that generation. There's a similar report
>     for the young generation. Running with -Xms equal to -Xmx should
>     prevent pages from being acquired from or returned to the OS
>     during the run.
>     >
>     > Are you running with -XX:+AlwaysPreTouch? Even if you've
>     reserved and committed the address space, the first time you touch
>     new pages the OS wants to zero them, which takes time. That flags
>     forces all the zeroing at initialization. If you know your page
>     size, you should be able to see the generations (mostly the old
>     generation) crossing a page boundary for the first time in the
>     -XX:+PrintHeapAtGC output.
>     >
>     > Or it could be some change in the collector between JDK-6 and JDK-7.
>     >
>     > Posting some log snippets might let sharper eyes see something.
>     >
>     > ... peter
>     >
>     > On 04/30/14 07:58, Chris Hurst wrote:
>     > > Hi,
>     > >
>     > > Has anyone seen anything similar to this ...
>     > >
>     > > On java 6 (range of versions 32bit Solaris) application ,
>     using parallel old gc, non adapative. Using a very heavy test
>     performance load we see minor GC's around the 5ms mark and some
>     very rare say 3or4 ish instances in 12 hours say 20ms pauses the
>     number of pauses is random (though always few compares with the
>     total number of GC's) and large ~20ms (this value appears the same
>     for all such points.) We have a large number of minor GC's in our
>     runs, only a full GC at startup. These freak GC's can be bunched
>     or spread out and we can run for many hours without one (though
>     doing minor GC's).
>     > >
>     > > What's odd is that if I use Java 7 (range of versions 32bit)
>     the result is very close but the spikes (1 or 2 arguably less) are
>     now 30-40ms (depends on run arguably even rarer). Has anyone
>     experienced anything similar why would Java 7 up to double a minor
>     GC / The GC throughput is approximately the same arguably 7 is
>     better throughput just but that freak minor GC makes it usable due
>     to latency.
>     > >
>     > > In terms of the change in spike height (20 (J6)vs40(J7)) this
>     is very reproducible though the number of points and when they
>     occur varies slightly. The over all GC graph , throughput is
>     similar otherwise as is the resultant memory dump at the end. The
>     test should be constant load, multiple clients just doing the same
>     thing over and over.
>     > >
>     > > Has anyone seen anything similar, I was hoping someone might
>     have seen a change in defaults, thread timeout, default data
>     structure size change that would account for this. I was hoping
>     the marked increase might be a give away to someone as its way off
>     our average minor GC time.
>     > >
>     > > We have looked at gclogs, heap dumps, processor activity,
>     background processes, amount of disc access, safepoints etc etc. ,
>     we trace message rate into out of the application for variation,
>     compare heap dumps at end etc. nothing stands out so far.
>     > >
>     > > Chris
>     > >
>     > >
>     > >
>     > >
>     > >
>     > >
>     > >
>     > >
>     > > _______________________________________________
>     > > hotspot-gc-use mailing list
>     > > hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>     > >
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140603/7eefe743/attachment.html>