Troubleshooting a ~40-second minor collection
Bernd Eckenfels
bernd-2013 at eckenfels.net
Mon Dec 2 14:53:29 PST 2013
Hello,
Hmm, switching numa off in linux boot means that there is no numa
optimization done by the kernel, the hardware is still numa (so the system
most likely behaves worse as there is no optimization for local memory
regions). (Maybe it would help to bind the java process to only one node
(odes CPUs and memory)). This is especially good if some other stuff
(other JVM or DB) can be bound to the other node.
As for your question on how to go with it, I would check for a large
number of hardware interrupts (hi,%irq) or context switches (compared to
idle times and %soft interrupts), not so sure if there is an easy way to
see if interrupt optimizations are active/needed by the drivers. (mpstat
-P ALL, vmstat, /proc/interrupts). I havent been into hardware lately, but
I would say >2k cs/s is something to observe closer.
For network cards for example ethtool can be used to tune it (see for
example
http://serverfault.com/questions/241421/napi-vs-adaptive-interrupts). But
I guess it is only a problem when you have mulitple GE interfaces (or
faster).
Gruss
Bernd
Am 02.12.2013, 23:26 Uhr, schrieb Aaron Daubman <daubman at gmail.com>:
> Hi Bernd,
>
> Thanks for the info.
> This is a numa machine, however, as part of setting up hugepages, I have
> disabled numa (numa=off) in grub.conf (and have also disabled transparent
> huge page support).
>
> The JVM process is the only significant process (aside from the high-rate
> data copy tar/nc/pigz) running on this 32-core, 2-node 64G RAM box. The
> tar
> process is limited to using one CPU (close to 100%) but leaving 31 others
> free for the JVM - load average on the box is fairly low.
>
> The JVM process is spread fairly evenly over the nodes - watching htop I
> can see CPU jumping around among the 32 cores.
>
> Do you know what I might look at to see network/disk driver missbehavior?
>
> Thanks!
> Aaron
>
>
> On Mon, Dec 2, 2013 at 5:21 PM, Bernd Eckenfels
> <bernd-2013 at eckenfels.net>wrote:
>
>> Hello Aaron,
>>
>> another rough guess is, that when you "copy high rate" that you have a
>> lot
>> of system interrupt time and conext switches (especially when the
>> network
>> or disk drivers are missbehaving).
>>
>> I wonder if if this can really slow down the GC so much, but it would be
>> the next thing I would investigate.
>>
>> Is this a NUMA machine? Is the JVM process spread over multiple nodes?
>>
>> Gruss
>> Bernd
>>
>>
>> Am 02.12.2013, 23:14 Uhr, schrieb Aaron Daubman <daubman at gmail.com>:
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
--
http://bernd.eckenfels.net
More information about the hotspot-gc-use
mailing list