Minor GC difference Java 7 vs Java 8

Thu Jul 17 19:43:31 UTC 2014

On 06/05/2014 05:21 PM, Chris Hurst wrote:
> Hi,
>
>
> WorkStealingYieldsBeforeSleep=500 gave me Java 6 like behaviour 
> (constant cost) and actually slightly better pause than Java 6 (less 
> than half max (default WorkStealingYieldsBeforeSleep=5000) java 7). So 
> I think I have defined them and can tune them as I wish  ... 
> reduce/remove or lessen their impact. So happy with that, i have some 
> nice charts ;-)
>
> On a separate note with regard to general average pause (non spikes / 
> regardless of changing that parameter)  GC performance appears slower 
> ~13-14% would that be expected or surprising

Chris,

In  jdk7 there were objects moved out of the permanent generation to
the Java heap.  Most notable were interned strings.   This was as a step
toward removing the permanent generation as happened in jdk8.  This
has the benefit of early collection of short lived interned strings but has
the cost of processing the longer lived interned strings through young GC's
(until they get copied to the old gen).   We did not see much difference
when testing with this change but if you have a large young gen mostly full
of dead objects, the effects of the longer lived interned strings will be
amplified.

This may not help but there is  a develop flag in jdk7  (i.e., not 
available
in product builds and gone in jdk8)

-XX:+JavaObjectsInPerm

that reverts to putting the interned strings and other objects back into
the perm gen.  If you run a debug build with it, some of the young GC
difference should go away (although the debug builds are so slow you
may find it hard to see).

Jon

> ?
>
> Chris
> ------------------------------------------------------------------------
> Date: Tue, 3 Jun 2014 10:37:24 -0700
> From: jon.masamitsu at oracle.com
> To: christopherhurst at hotmail.com; hotspot-gc-use at openjdk.java.net
> Subject: Re: Minor GC difference Java 7 vs Java 8
>
>
> On 06/03/2014 06:51 AM, Chris Hurst wrote:
>
>     Hi,
>
>      Reducing parallel threads on the simple testbed sample code
>     actually reduced minor GC STW's (presumably in this scenario
>     multithreading is inefficient due to the simplicity of the task),
>     however the reverse was true for the real application though for a
>     signal thread reduction the difference is hard to measure sub ms
>     if at all. I'd expect the difference to be because the real
>     application will have a more complex object graph and be tenuring
>     with a very small amount of promotion to old. If a spike does
>     occur with parallel gc threads reduced ie increasing other process
>     activity on the box the duration of the spike appears unaffected
>     by the reduction in parallel GC threads which I guess I would expect.
>
>     Any suggestions as to what value to use for
>     WorkStealingYieldsBeforeSleep, otherwise I'll just trying doubling
>     it ? If you have any additional notes , details on this change
>     that would be most helpful.
>
>
> WorkStealingYieldsBeforeSleep is one of those flags that
> just depends on your needs so I don't have any good
> suggestion.
>
> Jon
>
>
>     Chris
>
>       PS This particular application uses the following (we regularly
>     test other GC permutations including G1 / CMS, our goal is lower
>     latency) ..
>
>      -Xms1536m
>     -Xmx1536m
>     -XX:+PrintCommandLineFlags
>     -XX:+UseParallelOldGC
>     -XX:+UnlockDiagnosticVMOptions
>     -XX:PermSize=50M
>     -XX:-UseAdaptiveSizePolicy
>     -XX:+AlwaysPreTouch
>     -XX:MaxTenuringThreshold=15
>     -XX:InitialTenuringThreshold=15
>     -XX:+DTraceMonitorProbes
>     -XX:+ExtendedDTraceProbes
>     -Dsun.rmi.dgc.client.gcInterval=604800000
>     -Dsun.rmi.dgc.server.gcInterval=604800000
>
>     ------------------------------------------------------------------------
>     Date: Mon, 2 Jun 2014 15:52:14 -0700
>     From: jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>
>     To: christopherhurst at hotmail.com <mailto:christopherhurst at hotmail.com>
>     CC: hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     Subject: Re: Minor GC difference Java 7 vs Java 8
>
>
>     On 06/01/2014 05:14 PM, Chris Hurst wrote:
>
>         Hi,
>
>         Thanks for the replies, I've been stuck on this issue for
>         about a year and had raised it with Oracle support but hadn't
>         got anywhere but last weekend I managed to get a lot further
>         with it ..
>
>         I wrote a trivial program to continuously fill young gen and
>         release the garbage for use with some DTrace tests and this
>         showed a similar issue spikes wise from there I could work out
>         the issue as the parallel GC threads, i.e. anything less than
>         number of cores removed the spike (ie cores-1), for the test
>         program normal young GC oscillated about 1ms but spiked at
>         about 15ms(need to check) (Reducing the parallel threads
>         worked on the real application in a similar way).
>         We managed to identify some very minor tasks (they were so
>         small they weren't showing up on some of our less fine grained
>         CPU monitoring) that occasionally competed for CPU, the effect
>         was surprising relatively but now we understand the cause we
>         can tune the Java 6 GC better.
>         The spikes were again larger than I would have expected and
>         all appear to be every close in size, I wouldn't have
>         predicted this from the issue but that's fine ;-)
>
>         Currently the Java 7 version is still not quite as good on
>         overall throughput when not spiking though I will recheck
>         these results, as our most recent tests were around tuning J6
>         with the new info. We're happy with our Java 6 GC performance.
>
>         Although we can now reduce these already rare spikes
>         (potentially to zero), I can't 100% guarantee they won't occur
>         so I would still like to understand why Java 7 appears to
>         handle this scenario less efficiently.
>
>         Using Dtrace we were mostly seeing yields and looking at a
>         stack trace pointed us toward some JDK 7 changes and some
>         newer java options that might be related ?? ...
>
>         taskqueue.cpp
>
>         libc.so.1`lwp_yield+0x15 libjvm.so`__1cCosFyield6F_v_+0x257
>         libjvm.so`__1cWParallelTaskTerminatorFyield6M_v_+0x18
>         libjvm.so`__1cWParallelTaskTerminatorRoffer_termination6MpnUTerminatorTerminator__b_+0xe8
>         libjvm.so`__1cJStealTaskFdo_it6MpnNGCTaskManager_I_v_+0x378
>         libjvm.so`__1cMGCTaskThreadDrun6M_v_+0x19f
>         libjvm.so`java_start+0x1f2 libc.so.1`_thr_setup+0x4e
>         libc.so.1`_lwp_start  17:29
>
>         uintx WorkStealingHardSpins                     =
>         4096            {experimental}
>         uintx WorkStealingSleepMillis                   =
>         1               {experimental}
>         uintx WorkStealingSpinToYieldRatio              =
>         10              {experimental}
>         uintx WorkStealingYieldsBeforeSleep             =
>         5000            {experimental}
>
>         I haven't had a chance to play with these as yet but could
>         these be involved eg j7 tuned to be more friendly to other
>         applications at the cost of latency (spin to yield) ? Would
>         that make sense ?
>
>
>     Chris,
>
>     My best recollection is that there was a performance regression
>     reported internally and the change to 5000 was to fix
>     that regression.   Increasing the number of yield's done before
>     a sleep made this code work more like the previous behavior.
>     Let me know if you need better information  and I can see what
>     I can dig up.
>
>     By the way,  when you tuned down the number of ParallelGCThreads,
>     you saw little or no increase in the STW pause times?
>
>     You're using UseParallelGC?
>
>     Jon
>
>         We would like to move to Java 7 for support reasons, also as
>         we are on Solaris the extra memory over head of J8 (64bit
>         only) even with compressed oops gives us another latency hit.
>
>         Chris
>
>         PS -XX:+AlwaysPreTouch is on.
>
>         > Date: Thu, 29 May 2014 13:47:39 -0700
>         > From: Peter.B.Kessler at Oracle.COM
>         <mailto:Peter.B.Kessler at Oracle.COM>
>         > To: christopherhurst at hotmail.com
>         <mailto:christopherhurst at hotmail.com>;
>         hotspot-gc-use at openjdk.java.net
>         <mailto:hotspot-gc-use at openjdk.java.net>
>         > Subject: Re: Minor GC difference Java 7 vs Java 8
>         >
>         > Are the -XX:+PrintGCDetails "[Times: user=0.01 sys=0.00,
>         real=0.03 secs]" reports for the long pauses different from
>         the short pauses? I'm hoping for some anomalous sys time, or
>         user/real ratio, that would indicate it was something
>         happening on the machine that is interfering with the
>         collector. But you'd think that would show up as occasional
>         15ms blips in your message processing latency outside of when
>         the collector goes off.
>         >
>         > Does -XX:+PrintHeapAtGC show anything anomalous about the
>         space occupancy after the long pauses? E.g., more objects
>         getting copied to the survivor space, or promoted to the old
>         generation? You could infer the numbers from
>         -XX:+PrintGCDetails output if you didn't want to deal with the
>         volume produced by -XX:+PrintHeapAtGC.
>         >
>         > You don't say how large or how stable your old generation
>         size is. If you have to get new pages from the OS to expand
>         the old generation, or give pages back to the OS because the
>         old generation can shrink, that's extra work. You can infer
>         this traffic from -XX:+PrintHeapAtGC output by looking at the
>         "committed" values for the generations. E.g., in "ParOldGen
>         total 43008K, used 226K [0xba400000, 0xbce00000, 0xe4e00000)"
>         those three hex numbers are the start address for the
>         generation, the end of the committed memory for that
>         generation, and the end of the reserved memory for that
>         generation. There's a similar report for the young generation.
>         Running with -Xms equal to -Xmx should prevent pages from
>         being acquired from or returned to the OS during the run.
>         >
>         > Are you running with -XX:+AlwaysPreTouch? Even if you've
>         reserved and committed the address space, the first time you
>         touch new pages the OS wants to zero them, which takes time.
>         That flags forces all the zeroing at initialization. If you
>         know your page size, you should be able to see the generations
>         (mostly the old generation) crossing a page boundary for the
>         first time in the -XX:+PrintHeapAtGC output.
>         >
>         > Or it could be some change in the collector between JDK-6
>         and JDK-7.
>         >
>         > Posting some log snippets might let sharper eyes see something.
>         >
>         > ... peter
>         >
>         > On 04/30/14 07:58, Chris Hurst wrote:
>         > > Hi,
>         > >
>         > > Has anyone seen anything similar to this ...
>         > >
>         > > On java 6 (range of versions 32bit Solaris) application ,
>         using parallel old gc, non adapative. Using a very heavy test
>         performance load we see minor GC's around the 5ms mark and
>         some very rare say 3or4 ish instances in 12 hours say 20ms
>         pauses the number of pauses is random (though always few
>         compares with the total number of GC's) and large ~20ms (this
>         value appears the same for all such points.) We have a large
>         number of minor GC's in our runs, only a full GC at startup.
>         These freak GC's can be bunched or spread out and we can run
>         for many hours without one (though doing minor GC's).
>         > >
>         > > What's odd is that if I use Java 7 (range of versions
>         32bit) the result is very close but the spikes (1 or 2
>         arguably less) are now 30-40ms (depends on run arguably even
>         rarer). Has anyone experienced anything similar why would Java
>         7 up to double a minor GC / The GC throughput is approximately
>         the same arguably 7 is better throughput just but that freak
>         minor GC makes it usable due to latency.
>         > >
>         > > In terms of the change in spike height (20 (J6)vs40(J7))
>         this is very reproducible though the number of points and when
>         they occur varies slightly. The over all GC graph , throughput
>         is similar otherwise as is the resultant memory dump at the
>         end. The test should be constant load, multiple clients just
>         doing the same thing over and over.
>         > >
>         > > Has anyone seen anything similar, I was hoping someone
>         might have seen a change in defaults, thread timeout, default
>         data structure size change that would account for this. I was
>         hoping the marked increase might be a give away to someone as
>         its way off our average minor GC time.
>         > >
>         > > We have looked at gclogs, heap dumps, processor activity,
>         background processes, amount of disc access, safepoints etc
>         etc. , we trace message rate into out of the application for
>         variation, compare heap dumps at end etc. nothing stands out
>         so far.
>         > >
>         > > Chris
>         > >
>         > >
>         > >
>         > >
>         > >
>         > >
>         > >
>         > >
>         > > _______________________________________________
>         > > hotspot-gc-use mailing list
>         > > hotspot-gc-use at openjdk.java.net
>         <mailto:hotspot-gc-use at openjdk.java.net>
>         > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>         > >
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140717/ee8d7f74/attachment-0001.html>