<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix"><br>
Hi Ramki,<br>
<br>
On 1/11/13 6:50 PM, Srinivas Ramakrishna wrote:<br>
</div>
<blockquote
cite="mid:CABzyjynNXLidiWVRb_030+Wq=LUdnnb3qGvxM5LZL=B6rD=oUA@mail.gmail.com"
type="cite">Hi Bengt --<br>
<br>
Try computing the GC overhead by normalizing wrt the work done
(for which the net allocation volume might be a good proxy). As
you state, the performance numbers will then likely make sense. Of
course, they still won't explain why ParNew does better. As
Vitaly conjectures, the difference is likely in better object
co-location with ParNew's slightly more DFS-like evacuation
compared with DefNew's considerably more BFS-like evacuation
because of the latter's use of a pure Cheney scan compared with
the use of (a) marking stack(s) in the former, as far as i can
remember the code. One way to tell if that accounts for the
difference is to measure the cache-miss rates in the two cases
(and may be use a good tool like Solaris perf analyzer to show you
where the misses are coming from as well).<br>
</blockquote>
<br>
Thanks for bringing the DFS/BFS difference up. This is exactly the
kind of difference I was looking for. My guess is that this is what
causes the difference in JBB score. I'll see if I can investigate
this further.<br>
<blockquote
cite="mid:CABzyjynNXLidiWVRb_030+Wq=LUdnnb3qGvxM5LZL=B6rD=oUA@mail.gmail.com"
type="cite">Also curious if you can share the two sets of GC logs,
by chance? (specJBB is a for-fee benchmark so is not freely
available to the individual developer.)<br>
</blockquote>
<br>
I have a fairly large set of logs, but the runs are very stable so
I'm just attaching logs for one run for each collector. For
comparison I have also been running ParallelScavenge with one
thread. I'm using separate gc logs and jbb logs. The log files
called ".result" are the jbb output. The other logs are the gc logs.<br>
<br>
I'm running with a heap size of 1GB to avoid full GCs. All runs have
the two System.gc() induced full GCs but no other. ParallelScavenge
is performing even better than ParNew, but I am mostly interested in
the difference between ParNew and DefNew.<br>
<br>
A quick summary of the data in the logs:<br>
<tt><br>
</tt><tt> Score #GCs AverageGCTime</tt><tt><br>
</tt><tt>DefNew: 57903 2083</tt><tt> 0.044053195391262644</tt><tt><br>
</tt><tt>ParNew: 61363 2213 0.05931835969272489</tt><tt><br>
</tt><tt>PS: 69697 2213 0.06117092860370538</tt><br>
<br>
ParNew has a better score even though it does more GCs and they take
longer.<br>
<br>
If you have any insights from looking at the logs I would be very
happy to hear about it.<br>
<br>
Thanks,<br>
Bengt<br>
<blockquote
cite="mid:CABzyjynNXLidiWVRb_030+Wq=LUdnnb3qGvxM5LZL=B6rD=oUA@mail.gmail.com"
type="cite"><br>
thanks.<br>
-- ramki<br>
<br>
<div class="gmail_quote">On Fri, Jan 11, 2013 at 4:57 AM, Bengt
Rutisson <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:bengt.rutisson@oracle.com" target="_blank">bengt.rutisson@oracle.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div><br>
Hi Vitaly,
<div class="im"><br>
<br>
On 1/11/13 1:45 PM, Vitaly Davidovich wrote:<br>
</div>
</div>
<div class="im">
<blockquote type="cite">
<p dir="ltr">Hi Bengt,</p>
<p dir="ltr">Regarding the benchmark score, are you
saying ParNew has longer cumulative GC time or just
the average is higher? If it's just average, maybe the
total # of them (and cumulative time) is less. I
don't know the characteristics of this particular
specjbb benchmark, but perhaps having fewer total GCs
is better because of the overhead of getting all
threads to a safe point, going go the OS to suspend
them, and then restarting them. After they're
restarted, the CPU cache may be cold for it because
the GC thread polluted it. Or I'm entirely wrong in
my speculation ... :).</p>
</blockquote>
<br>
</div>
You have a good point about the number of GCs. The problem
in my runs is that ParNew does more GCs than DefNew. So
there are both more of them and their average time is
higher, but the score is still better. That ParNew does more
GCs is not that strange. It has a higher score, which means
that it had higher throughput and had time to create more
objects. So, that is kind of expected. But I don't
understand how it can have higher throughput when the GCs
take longer. My current guess is that it does something
differently with how objects are copied in a way that is
beneficial for the execution time between GCs.<br>
<br>
It also seems like ParNew keeps more objects alive for each
GC. That is either the reason why it does more and more
frequent GCs than DefNew, or it is an effect of the fact
that more objects are created due to the higher throughput.
This is the reason I started looking at the tenuring
threshold.<span class="HOEnZb"><font color="#888888"><br>
<br>
Bengt</font></span>
<div>
<div class="h5"><br>
<br>
<blockquote type="cite">
<p dir="ltr">Thanks</p>
<p dir="ltr">Sent from my phone</p>
<div class="gmail_quote">On Jan 11, 2013 6:02 AM,
"Bengt Rutisson" <<a moz-do-not-send="true"
href="mailto:bengt.rutisson@oracle.com"
target="_blank">bengt.rutisson@oracle.com</a>>
wrote:<br type="attribution">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div><br>
Hi Ramki,<br>
<br>
Thanks for looking at this!<br>
<br>
On 1/10/13 9:28 PM, Srinivas Ramakrishna
wrote:<br>
</div>
<blockquote type="cite">Hi Bengt --<br>
<br>
The change looks reasonable, but I have a
comment and a follow-up question.<br>
<br>
Not your change, but I'd elide the "half the
real survivor size" since it's really a
configurable parameter based on
TargetSurvivorRatio with default half.<br>
I'd leave the comment as "set the new tenuring
threshold and desired survivor size".<br>
</blockquote>
<br>
I'm fine with removing this from the comment,
but I thought the "half the real survivor size"
aimed at the fact that we pass only the "to"
capacity and not the "from" capacity in to
compute_tenuring_threshold(). With that
interpretation I think the comment is correct.<br>
<br>
Would you like me to remove it anyway? Either
way is fine with me.<br>
<br>
<blockquote type="cite">I'm curious though, as
to what performance data prompted this change,</blockquote>
Good point. This change was preceded by an
internal discussion in the GC team, so I should
probably have explained the background more in
my review request to the open.<br>
<br>
I was comparing the ParNew and DefNew
implementation since I am seeing some strange
differences in some SPECjbb2005 results. I am
running ParNew with a single thread and get much
better score than with DefNew. But I also get
higher average GC times. So, I was trying to
figure out what DefNew and ParNew does
differently.<br>
<br>
When I was looking at
DefNewGeneration::collect() and
ParNewGeneration::collect() I saw that they
contain a whole lot of code duplication. It
would be tempting to try to extract the common
code out into DefNewGeneration since it is the
super class. But there are some minor
differences. One of them was this issue with how
they handle the tenuring threshold.<br>
<br>
We tried to figure out if there is a reason for
ParNew and DefNew to behave different in this
regard. We could not come up with any good
reason for that. So, we needed to figure out if
we should change ParNew or DefNew to make them
consistent. The decision to change ParNew was
based on two things. First, it seems wrong to
use the data from a collection that got
promotion failure. This collection will not have
allowed the tenuring threshold to fulfill its
purpose. Second, ParallelScavenge works the same
way as DefNew.<br>
<br>
BTW, the difference between DefNew and ParNew
seems to have been there from the start. So,
there is no bug or changeset in mercurial or
TeamWare to explain why the difference was
introduced. <br>
<br>
(Just to be clear, this difference was not the
cause of my performance issue. I still don't
have a good explanation for how ParNew can have
longer GC times but better SPECjbb score.)<br>
<br>
<blockquote type="cite">and whether it might
make sense, upon a promotion failure to do
something about the tenuring threshold for the
next scavenge (i.e. for example make the
tenuring threshold half of its current value
as a reaction to the fact that promotion
failed). Is it currently left at its previous
value or is it asjusted back to the default
max value (which latter may be the wrong thing
to do) or something else?<br>
</blockquote>
<br>
As far as I can tell the tenuring threshold is
left untouched if we get a promotion failure. It
is probably a good idea to update it in some
way. But I would prefer to handle that as a
separate bug fix.<br>
<br>
This change is mostly a small cleanup to make
DefNewGeneration::collect() and
ParNewGeneration::collect() be more consistent.
We've done the thinking so, it's good to make
the change in preparation for the next person
that comes a long and has a few cycles over and
would like to merge the two collect() methods in
some way.<br>
<br>
Thanks again for looking at this!<br>
Bengt<br>
<br>
<blockquote type="cite"> <br>
-- ramki<br>
<br>
<div class="gmail_quote">On Thu, Jan 10, 2013
at 1:30 AM, Bengt Rutisson <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:bengt.rutisson@oracle.com"
target="_blank">bengt.rutisson@oracle.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"><br>
Hi everyone,<br>
<br>
Could I have a couple of reviews for this
small change to make DefNew and ParNew be
more consistent in the way they treat the
tenuring threshold:<br>
<br>
<a moz-do-not-send="true"
href="http://cr.openjdk.java.net/%7Ebrutisso/8005972/webrev.00/"
target="_blank">http://cr.openjdk.java.net/~brutisso/8005972/webrev.00/</a><br>
<br>
Thanks,<br>
Bengt<br>
</blockquote>
</div>
<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
</body>
</html>