Troubleshooting a ~40-second minor collection

Aaron Daubman daubman at gmail.com
Mon Dec 2 20:32:38 PST 2013


>  (Maybe it would help to bind the java process to only one node
> (odes CPUs and memory)). This is especially good if some other stuff
> (other JVM or DB) can be bound to the other node.
>

Unfortunately only this one large JVM (Jetty / Solr) runs on this system,
so using numactl to bind to one node would waste half the compute resources
=(
Also, in steady state (23 hours a day) this appears to work fine - it's
only during this network copy of the solr index out to the nodes that we
see any GC issue.



> As for your question on how to go with it, I would check for a large
> number of hardware interrupts (hi,%irq) or context switches (compared to
> idle times and %soft interrupts), not so sure if there is an easy way to
> see if interrupt optimizations are active/needed by the drivers. (mpstat
> -P ALL, vmstat, /proc/interrupts). I havent been into hardware lately, but
> I would say >2k cs/s is something to observe closer.
>

Hmm... we have cacti monitoring these hosts, and I do see that we jump from
a steady state of ~2k CS (I think per minute, but that could be per second)
up to 10k CS during the data transfer.
Could you explain (or point me to docs) why high context switching like
this would lead to long minor collections?

Also, I used GCViewer to open some of the logs, and if it is actually
parsing things correctly, my max GC pause time is actually 1-3s, so I must
have been seen accumulated time (measured in ms/min) as up to 40s/min, but
I guess this is not from a single pause/gc event.

Would it help to narrow the GC logs I have to around the time of the long
minor GC events and include them?



> For network cards for example ethtool can be used to tune it (see for
> example
> http://serverfault.com/questions/241421/napi-vs-adaptive-interrupts). But
> I guess it is only a problem when you have mulitple GE interfaces (or
> faster).
>

Hmm... all of these servers have dual-bonded GE interfaces, I wonder if
something is up there. (they actually have bonded internal and bonded
external as two bonded pairs, this traffic would be going over the bonded
internal interface).

Thanks again,
     Aaron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131202/14333aaa/attachment.html 


More information about the hotspot-gc-use mailing list