Weird GC-- and parallel marking times

Mon Jun 30 14:09:40 PDT 2008

I'm running SPECjbb2005 on a T1000.

Among the things I noticed is this anomalous GC-- collection:

     783.409: [GC [PSYoungGen: 1834992K->262128K(1835008K)] 3251983K->1692937K(3407872K), 0.4594853 secs] [Times: user=10.15 sys=0.14, real=0.46 secs]
     786.112: [GC [PSYoungGen: 1834992K->262128K(1835008K)] 3265801K->1706512K(3407872K), 0.4680468 secs] [Times: user=10.26 sys=0.17, real=0.47 secs]
     788.835: [GC [PSYoungGen: 1834992K->238544K(1835008K)] 3279376K->1721236K(3407872K), 0.4713421 secs] [Times: user=10.38 sys=0.15, real=0.47 secs]
     791.552: [GC-- [PSYoungGen: 1811408K->1811408K(1835008K)] 3294100K->3384268K(3407872K), *6.9766836* secs] [Times: user=15.02 sys=28.09, real=6.98 secs]
     798.529: [pre compact, 0.0000565 secs]
     798.529: [Full GC798.532: [marking phase798.532: [par mark, *2.6420171* secs]
     801.174: [reference processing, 0.0036040 secs]
     801.178: [class unloading ....
     , 0.0106920 secs]
     , 2.6567804 secs]
     801.189: [summary phase, 0.3559888 secs]
     801.545: [adjust roots, 0.0154521 secs]
     801.560: [compact perm gen, 0.0660016 secs]
     801.626: [compaction phase801.628: [drain task setup, 0.0541014 secs]
     801.683: [dense prefix task setup, 0.0000028 secs]
     801.683: [steal task setup, 0.0000089 secs]
     801.683: [par compact, 4.3277560 secs]
     806.011: [deferred updates, 2.2672261 secs]
     , 6.6517610 secs]
     808.278: [post compact, 0.5683043 secs]
      [PSYoungGen: 1811408K->0K(1835008K)] [ParOldGen: 1572860K->1420159K(1572864K)] 3384268K->1420159K(3407872K) [PSPermGen: 5850K->5801K(16384K)], 10.3176805 secs] [Times: user=141.41 sys=5.16, real=10.32 secs]
     811.078: [GC [PSYoungGen: 1572864K->109712K(1835008K)] 2993023K->1529871K(3407872K), 0.1969672 secs] [Times: user=4.36 sys=0.05, real=0.20 secs]
     813.506: [GC [PSYoungGen: 1682576K->123248K(1835008K)] 3102735K->1543407K(3407872K), 0.1973749 secs] [Times: user=4.48 sys=0.01, real=0.20 secs]
     815.937: [GC [PSYoungGen: 1696112K->125808K(1835008K)] 3116271K->1545967K(3407872K), 0.2015174 secs] [Times: user=4.58 sys=0.02, real=0.20 secs]

We are running along fine, the old generation is filling up, we fail
a promotion (okay, I'm running with -XX:-UseAdaptiveSizePolicy an
-XX:-UsePSAdaptiveSurvivorSizePolicy, because I'm trying to generate
full collections).  We finish out the scavenge in 6.977 seconds, and
drop into a full collection, during which parallel marking takes
2.642 seconds.  I found this collection because 2.642 seconds is
way off scale for parallel marking, and that's the code I'm working
on.  (This was the only anomalous parallel mark in 20 runs.) A "usual"
failed promotion, with my code (or without it), looks like this
(from 2 runs earlier with my JVM):

     786.886: [GC [PSYoungGen: 1834992K->262128K(1835008K)] 3255509K->1699440K(3407872K), 0.4616326 secs] [Times: user=10.22 sys=0.13, real=0.46 secs]
     789.602: [GC [PSYoungGen: 1834992K->262128K(1835008K)] 3272304K->1714991K(3407872K), 0.4597556 secs] [Times: user=10.16 sys=0.13, real=0.46 secs]
     792.314: [GC [PSYoungGen: 1834992K->262128K(1835008K)] 3287855K->1729817K(3407872K), 0.4662842 secs] [Times: user=10.18 sys=0.18, real=0.47 secs]
     795.031: [GC-- [PSYoungGen: 1834992K->1834992K(1835008K)] 3302681K->3407853K(3407872K), *9.1016039* secs] [Times: user=17.65 sys=67.29, real=9.10 secs]
     804.133: [pre compact, 0.0000572 secs]
     804.133: [Full GC804.135: [marking phase804.135: [par mark, *1.4553449* secs]
     805.591: [reference processing, 0.0036024 secs]
     805.594: [class unloading ....
     , 0.0107485 secs]
     , 1.4701466 secs]
     805.605: [summary phase, 0.3576797 secs]
     805.963: [adjust roots, 0.0158427 secs]
     805.979: [compact perm gen, 0.0660806 secs]
     806.045: [compaction phase806.047: [drain task setup, 0.0546181 secs]
     806.102: [dense prefix task setup, 0.0000028 secs]
     806.102: [steal task setup, 0.0000088 secs]
     806.102: [par compact, 4.3571865 secs]
     810.460: [deferred updates, 2.2664570 secs]
     , 6.6809621 secs]
     812.726: [post compact, 0.5718161 secs]
      [PSYoungGen: 1834992K->0K(1835008K)] [ParOldGen: 1572861K->1420624K(1572864K)] 3407853K->1420624K(3407872K) [PSPermGen: 5848K->5801K(16384K)], 9.1659174 secs] [Times: user=136.08 sys=0.36, real=9.17 secs]
     815.541: [GC [PSYoungGen: 1572864K->109824K(1835008K)] 2993488K->1530448K(3407872K), 0.1945033 secs] [Times: user=4.36 sys=0.03, real=0.19 secs]
     817.973: [GC [PSYoungGen: 1682688K->123072K(1835008K)] 3103312K->1543696K(3407872K), 0.1971958 secs] [Times: user=4.49 sys=0.01, real=0.20 secs]
     820.411: [GC [PSYoungGen: 1695936K->125808K(1835008K)] 3116560K->1546432K(3407872K), 0.2009367 secs] [Times: user=4.56 sys=0.01, real=0.20 secs]

Note that the anomalous GC-- time took much less time (6.977 versus
9.102 seconds), but the subsequent parallel mark took much longer
(2.642 versus 1.455 seconds).

On the one hand, since I'm worried about parallel marking times,
the anomalous one is a loss, using 2.642 instead of 1.455 seconds.
On the other hand, for the matched pair of GC-- collection and
subsequent full collection, the anomalous one uses (6.977 + 10.318)
17.294 seconds, whereas the normal collection uses (9.102 +  9.166)
18.268 seconds, saving 0.973 seconds, or 5%.

That would be a good trade.  So maybe the question is: Is there a
way we can make that trade all the time?  Also: Are there times when
it's a good trade and times when it's not?  Can we know in advance
which one we have?

I've been trying to come up with an explanation for the anomalous
collection, but have failed so far.  The best I can come up with is
that the GC-- collection finishes faster, and most of the time for
scavenges is the copying time, so maybe it failed sooner and didn't
have to copy as much.  But that leaves more live stuff in the young
generation to copy during parallel compaction.  That somehow requires
that we've traded (9.102 - 6.977) 2.125 seconds of parallel scavenging
for (10.318 - 9.166) 1.152 seconds of parallel compaction?  That
doesn't fly because the parallel compaction times (and the rest of
the statistics) are essentially the same between the two.

If anyone has a theory, I'd be happy to hear it.

			... peter

P.S. Thanks to John for making me look at the details.