<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
I do not know why it has worked for a week.<br>
Maybe it is because this was the xmas week ;-)<br>
<br>
In the night there are a lot of disk operations (2 TB of data is
written). Therefore the operating system caches a lot of files and
tries to free memory for this, so unused pages are moved to swap
space.<br>
I assume heap fragmentation avoids swapping, since more pages are
touched during the application is running. After a compacting gc
there is one large (free) block which is not touched until young gc
copies the objects from eden space. This will yield the operating
system to move the pages of this one free block to swap and at every
young gc it has to read it from swap.<br>
After a CMS collection the following young gcs are much faster
because the gaps in the heap are not swapped.<br>
<br>
Yesterday, we have turned off the swap on this machine and now all
young gcs take less than 200ms (instead of 6s) :-)<br>
Thanks againt to Chi Ho Kwok for giving the key hint :-)<br>
<br>
Flo<br>
<br>
<br>
Am 11.01.2012 10:00, schrieb Srinivas Ramakrishna:
<blockquote
cite="mid:CABzyjykkn0ri8w-W-JtSys8ph-WwtySYsaC9=_SEbVDZUhe2FQ@mail.gmail.com"
type="cite"><br>
<br>
<div class="gmail_quote">On Mon, Jan 9, 2012 at 3:08 AM, Florian
Binder <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:java@java4.info">java@java4.info</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
...<br>
I have seen that this problem occurs only after about one week
of<br>
uptime. Even thought we make a full (compacting) gc every
night.<br>
Since real-time > user-time I assume it might be a
synchronization<br>
problem. Can this be true?<br>
<br>
</blockquote>
<div><br>
Together with your and Chi-Ho's conclusion that this is
possibly related to paging,<br>
a question to ponder is why this happens only after a week.
Since your process'<br>
heap size is presumably fixed and you have seen multiple full
GC's (from which<br>
i assume that your heap's pages have all been touched), have
you checked to<br>
see if the size of either this process (i.e. its native size)
or of another process<br>
on the machine has grown during the week so that you start
swapping?<br>
<br>
I also find it interesting that you state that whenever you
see this problem<br>
there's always a single block in the old gen, and that the
problem seems to go<br>
away when there are more than one block in the old gen. That
would seem<br>
to throw out the paging theory, and point the finger of
suspicion to some kind<br>
of bottleneck in the allocation out of a large block. You also
state that you<br>
do a compacting collection every night, but the bad behaviour
sets in only<br>
after a week.<br>
<br>
So let me ask you if you see that the slow scavenge happens to
be the first<br>
scavenge after a full gc, or does the condition persist for a
long time and<br>
is independent if whether a full gc has happened recently?<br>
<br>
Try turning on -XX:+PrintOldPLAB to see if it sheds any
light...<br>
<br>
-- ramki<br>
</div>
</div>
</blockquote>
<br>
</body>
</html>