Hi Ralf,<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
we try to achieve low latencies despite using a huge heap (10G) and many<br>
logical cores (64).<br>
VM is 1.7u1. Ideally, we would let GC ergonomics decide what is best,<br>
giving only a low pause time goal (50ms).<br>
<br>
-Xss2m<br>
-Xmx10000M<br>
-XX:PermSize=256m<br>
-XX:+UseAdaptiveGCBoundary<br>
-XX:+UseAdaptiveSizePolicy<br>
-XX:+UseConcMarkSweepGC<br>
-XX:MaxGCPauseMillis=100<br>
-XX:ParallelGCThreads=12<br>
<br>
-XX:+BindGCTaskThreadsToCPUs<br>
-XX:+UseGCTaskAffinity<br>
<br>
-XX:+UseCompressedOops<br>
-XX:+DoEscapeAnalysis<br>
<br>
Whenever we use adaptive sizes, the VM will crash in GenCollect*, as<br>
soon as some serious allocations start. I already filed a bug for this<br>
(7112413).<br>
<br>
Assuming a small newsize helps maintaining a low pause time goal, I can<br>
set the newsize, too. Say I set it to 100MB, it will increase later<br>
anyway, again yielding frequent pause times in over 1s by the time the<br>
newsize is around 1G.<br>
<br>
What am I doing wrong here?<br></blockquote><div><br>Your assumption about correlation of new size and pause time may be very wrong. My experiments (but on x86 architecture,not spark) has shown that young collection pause time on large heap mostly dominated by time required to scan dirty card table, thus proportional to old space size.You may find detailed math here <a href="http://aragozin.blogspot.com/2011/06/understanding-gc-pauses-in-jvm-hotspots.html">http://aragozin.blogspot.com/2011/06/understanding-gc-pauses-in-jvm-hotspots.html</a> and some expiremental proof here <a href="http://aragozin.blogspot.com/2011/07/openjdk-patch-cutting-down-gc-pause.html">http://aragozin.blogspot.com/2011/07/openjdk-patch-cutting-down-gc-pause.html</a><br>
<br>As for recipe for low pause GC, I have outlined may here <a href="http://aragozin.blogspot.com/2011/07/gc-check-list-for-data-grid-nodes.html">http://aragozin.blogspot.com/2011/07/gc-check-list-for-data-grid-nodes.html</a>.<br>
But I have to say that again,all my expirience is relevant for x86, I will not be surprised too much if things work differently on sparc.<br></div></div><br>PS<br>sorry for posting so much links, but they are really helpful to catch up on topic of low pause GC tuning <br>
<br>Regards,<br>Alexey<br>