young-collections are too infrequent?

Mon Dec 4 23:09:52 UTC 2017

increasing the max-pause-time from 100ms to 200ms has certainly made our
young-collections less frequent (before they were about every 5-10 seconds;
now they're 30-60+ seconds apart).  unfortunately our Full-GC woes
continue.

one thing i noticed is that once the JVM is warmed up only
humongous-allocations seem to trigger concurrent-mark/mixed-GC cycles (not
counting the first failed young-collection that heralds the soon-to-follow
full-GC).
this is the last healthy young-collection:

2017-12-01T15:08:07.619+0000: [GC pause (G1 Evacuation Pause) (young)
 302437.685: [G1Ergonomics (CSet Construction) start choosing CSet,
_pending_cards: 153647, predicted base time: 60.27 ms, remaining time:
139.73 ms, target pause time: 200.00 ms]
 302437.685: [G1Ergonomics (CSet Construction) add young regions to
CSet, eden: 2293 regions, survivors: 6 regions, predicted young region
time: 108.96 ms]
 302437.685: [G1Ergonomics (CSet Construction) finish choosing CSet,
eden: 2293 regions, survivors: 6 regions, old: 0 regions, predicted
pause time: 169.23 ms, target pause time: 200.00 ms]
 *54G->17G(59G)*, 0.1609184 secs]

since it left 17G of the 59G heap (28.8%) allocated which would seem
to indicate an IHOP of 25 should trigger mixed-collections more
frequently, however, given these next 2 less-than-healthy
young-collections i'm not sure mixed-collections will prevent what
we're experiencing.

in other words, if the above young-collection had been followed by
mixed-collections it would have only collected perhaps another 10GB of
old.

this is the next YC which is the first failed collection.

2017-12-01T15:08:40.241+0000: [GC pause (G1 Evacuation Pause) (young)

 302470.307: [G1Ergonomics (CSet Construction) start choosing CSet,
*_pending_cards: 133537*, predicted base time: 86.00 ms, remaining
time: 114.00 ms, target pause time: 200.00 ms]
 302470.307: [G1Ergonomics (CSet Construction) add young regions to
CSet, eden: 2283 regions, survivors: 16 regions, predicted young
region time: 85.78 ms]
 302470.307: [G1Ergonomics (CSet Construction) finish choosing CSet,
*eden: 2283 regions, survivors: 16 regions*, old: 0 regions, predicted
pause time: 171.77 ms, target pause time: 200.00 ms]
 302472.204: [G1Ergonomics (Heap Sizing) *attempt heap expansion,
reason: region allocation request failed*, allocation request: 57512
bytes]
 302472.204: [G1Ergonomics (Heap Sizing) expand the heap, requested
expansion amount: 57512 bytes, attempted expansion amount: 16777216
bytes]
 302472.204: [G1Ergonomics (Heap Sizing) did not expand the heap,
reason: *heap already fully expanded*]
 302480.689: [G1Ergonomics (Concurrent Cycles) request concurrent
cycle initiation, reason: occupancy higher than threshold, occupancy:
55851352064 bytes, allocation request: 0 bytes, threshold: 28938181005
bytes (45.00 %), source: end of GC]*-- 53G->56G(59G), 10.3817795
secs*]

the above Young-Collection CSet contained 2283 eden and 16 survivor regions
but then it tried to allocate more heap?  due to lack of space for
promotion? due to heap fragmentation by humongous objects?  (the JVM is
using 16M regions so it should have ~3833 regions which means it's 60%
young-generation at this point).
and why would it try to expand the heap-size at all when -Xmx and -Xms are
fixed to the same value and thus the attempt is guaranteed to fail?  and
since the YC is a STW event how did it go from 2283 eden + 16 survivor to 1
eden + 288 survivor and still not recover any heap(based on before/after
sizes of 53G->56G)?   also there's a 2 second gap between finishing the
CSet and the failed attempted heap-increase and the whole thing take 10
seconds.  i've asked for more verbose GC logging from ops to hopefully
clarify where all those seconds got wasted.

the next Young-Collection found over 6.5M pending cards and only 1 eden and
288 survivors (presumably created by the previous young-collection).

 302480.710: [G1Ergonomics (Concurrent Cycles) initiate concurrent
cycle, reason: concurrent cycle initiation requested]

2017-12-01T15:08:50.644+0000: [GC pause (G1 Evacuation Pause) (young)
(initial-mark)
 302480.711: [G1Ergonomics (CSet Construction) start choosing CSet,
*_pending_cards: 6533165*, predicted base time: 1123.81 ms, remaining
time: 0.00 ms, target pause time: 200.00 ms]
 302480.711: [G1Ergonomics (CSet Construction) add young regions to
CSet, eden: 1 regions, survivors: 288 regions, predicted young region
time: 160.18 ms]
 302480.711: [G1Ergonomics (CSet Construction) finish choosing CSet,
*eden: 1 regions, survivors: 288 regions*, old: 0 regions, predicted
pause time: 1283.99 ms, target pause time: 200.00 ms]
 302482.600: [G1Ergonomics (Heap Sizing) attempt heap expansion,
reason: region allocation request failed, allocation request: 8388600
bytes]
 302482.600: [G1Ergonomics (Heap Sizing) expand the heap, requested
expansion amount: 8388600 bytes, attempted expansion amount: 16777216
bytes]
 302482.600: [G1Ergonomics (Heap Sizing) did not expand the heap,
reason: heap already fully expanded]
-- 56G->56G(59G), 10.2031695 secs]
2017-12-01T15:09:00.848+0000: [GC concurrent-root-region-scan-start]

so, does it seem that decreasing IHOP and increasing the concurrent-mark
heap and thread-count will address this since it didn't even try to start a
concurrent-mark cycle until it was basically too late?  all subsequent
young-collections failed with 0 eden, 0 survivor and even 0 pending-cards
and then the Full-GC occurs at 15:09:08 (28 seconds later) and
recovers 59G->6766M.

this is all from a 8-core VM running "Java HotSpot(TM) 64-Bit Server VM
(25.131-b11)" (i couldn't find any mention of GC improvements between build
131 and the current build, 151, so selling Ops on increasing the JVM
version is going to be a difficult sell and moving to Java 9 will, of
course, require a lot more preparations). note, one thing about these JVMs
is that they do not have exclusive use of these cores but share them with
at least 1 Apache server (which is fronting the Tomcat that generated these
GC logs).

sorry if this is a variation on a previous question but sometimes this
mailing list feels like the only place to get answers.

thanks.
....................ron.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20171204/1537e5d4/attachment.html>