<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Looks good to me.<div><br></div><div>igor</div><div><br><div><div>On May 4, 2012, at 3:03 AM, Mikael Vidstedt wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
<div bgcolor="#FFFFFF" text="#000000">
<br>
Update:<br>
<br>
I'd like to continue working on this issue, and as part of that look
at how the promoted and pretenured counters are used today and
potentially update the logic given the insights around bytes vs.
words etc.<br>
<br>
I think it would be helpful to address the crash problem short term
though, and so I'd like to get feedback on the updated webrev:<br>
<br>
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~mikael/7158457/webrev.01/">http://cr.openjdk.java.net/~mikael/7158457/webrev.01/</a><br>
<br>
It's essentially the same as the previous one with Igor's suggested
_is_old check improvement. Would it be ok to make this fix and clone
the bug to track the follow-up work cleaning up the actual
logic/heuristics?<br>
<br>
Thanks,<br>
Mikael<br>
<br>
On 2012-04-26 03:38, Mikael Vidstedt wrote:
<blockquote cite="mid:4F98A720.1090706@oracle.com" type="cite">
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
<br>
On 2012-04-24 23:50, Igor Veresov wrote:
<blockquote cite="mid:9485F5F2-BDA7-4845-ADC2-674DF555E6C3@gmail.com" type="cite">
<div>Wasn't it the case that _avg_pretenured value is formed
incorrectly? I don't think it should be sampled at every
allocation, I think it should measure the amount of data
allocated in the old gen between young collections, also if
you remember we agreed that the dimensions are wrong. Or were
you able to find a better explanation as opposed to what we
discussed before?</div>
</blockquote>
<br>
Thanks for reminding me - I believe you're absolutely right. For a
while I was thinking it actually did make sense after all, but I
just did an experiment to see what actually happens at runtime:<br>
<br>
The _avg_pretenured counter is indeed sampled every time an
allocation is made directly in old gen. The actual value reflects
the average size *in words* of the object that was allocated. Put
differently, it's the average size of pretenured objects in words.<br>
<br>
The counter is used in PSAdaptiveSizePolicy::update_averages to
calculate the average amount of promoted data:<br>
<br>
<font face="Courier New, Courier, monospace">avg_promoted()->sample(promoted
+ _avg_pretenured->padded_average());<br>
</font><br>
The promoted parameter is the number of *bytes* that were just
promoted. To sum it up, that appears to imply that there are two
problems with the above computation:<br>
<br>
1. It's taking the size of everything that was just promoted and
adds the size of an average prenured object (as opposed to the
total size of all recently pretenured objects)<br>
2. It's adding variables with different dimensions - promoted is
in bytes, and the _avg_pretenured padded average is in words<br>
<br>
Since that effectively means _avg_pretenured it's off by a factor
(word_size * number_of_objects) I'm guessing it will in the common
case not really matter to the calculation of avg_promoted...<br>
<br>
<br>
<blockquote cite="mid:9485F5F2-BDA7-4845-ADC2-674DF555E6C3@gmail.com" type="cite">
<div><br>
</div>
<div>As for the treatment of the symptom, the code looks good to
me. Do you think it might be beneficial to check the old value
of _is_old before assigning to it? Would cause less memory
traffic, if increment_count() is called frequently.</div>
<div>
<pre style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><span class="changed" style="color: blue; "> 60 void increment_count() {</span>
<span class="changed" style="color: blue; "> 61 _sample_count++;</span><font class="Apple-style-span" color="#0000ff">
</font></pre>
<pre style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><span class="Apple-style-span" style="font-family: Helvetica; white-space: normal; "><pre style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><span class="changed" style="color: blue; "> </span><span class="changed"><font class="Apple-style-span" color="#ff0000">62 if (!_is_old && _sample_count > OLD_THRESHOLD) {</font></span></pre></span><span class="changed" style="color: blue; "> 63 _is_old = true;</span>
<span class="changed" style="color: blue; "> 64 }</span>
<span class="changed" style="color: blue; "> 65 }</span></pre>
</div>
</blockquote>
<br>
Good suggestion. I'll update my change with this in case we need
something urgently, but given the above issues it's likely a good
idea to take another pass at this.<br>
<br>
Thanks,<br>
Mikael<br>
<br>
<blockquote cite="mid:9485F5F2-BDA7-4845-ADC2-674DF555E6C3@gmail.com" type="cite">
<div>
<div><br>
</div>
</div>
<div>igor</div>
<br>
<div>
<div>On Apr 24, 2012, at 8:01 AM, Mikael Vidstedt wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<div><br>
Hi all,<br>
<br>
The statistical counters in gcUtil are used to keep track
of historical information about various key metrics in the
garbage collectors. Built in to the core
AdaptiveWeightedAverage base class is the concept of aging
the values, essentially treating the first 100 values
differently and putting more weight on them since there's
not yet enough historical data built up.<br>
<br>
In the class there is a 32-bit counter (_sample_count)
that incremented for every sample and used to compute
scale the weight of the added value (see
compute_adaptive_average), and the scaling logic divides
100 by the count. In the normal case this is not a problem
- the counters are reset every once in a while and/or grow
very slowly. In some pathological cases the counter will
however continue to increment and eventually
overflow/wrap, meaning the 32-bit count will go back to
zero and the division in compute_adaptive_average will
lead to a div-by-zero crash.<br>
<br>
The test case where this is observed is a test that stress
tests allocation in combination with the GC locker.
Specifically, the test is multi-threaded which pounds on
java.util.zip.Deflater.deflate, which internally uses the
GetPrimitiveArrayCritical JNI function to temporarily lock
out the GC (using the GC locker). The garbage collector
used is in this case the parallel scavenger and the the
counter that overflows is _avg_pretenured. _avg_pretenured
is incremented/sampled every time an allocation is made
directly in the old gen, which I believe is more likely
when the GC locker is active.<br>
<br>
<br>
The suggested fix is to only perform the division in
compute_adaptive_average when it is relevant, which
currently is for the first 100 values. Once there are more
than 100 samples there is no longer a need to scale the
weight.<br>
<br>
This problem is tracked in 7158457 (stress: jdk7 u4 core
dumps during megacart stress test run).<br>
<br>
Please review and comment on the webrev below:<br>
<br>
<a moz-do-not-send="true" href="http://cr.openjdk.java.net/%7Emikael/7158457/webrev.00">http://cr.openjdk.java.net/~mikael/7158457/webrev.00</a><br>
<br>
Thanks,<br>
Mikael<br>
<br>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
</blockquote>
<br>
</div>
</blockquote></div><br></div></body></html>