RFR(S): 7158457: division by zero in adaptiveweightedaverage
Igor Veresov
iggy.veresov at gmail.com
Wed Apr 25 06:50:36 UTC 2012
Wasn't it the case that _avg_pretenured value is formed incorrectly? I don't think it should be sampled at every allocation, I think it should measure the amount of data allocated in the old gen between young collections, also if you remember we agreed that the dimensions are wrong. Or were you able to find a better explanation as opposed to what we discussed before?
As for the treatment of the symptom, the code looks good to me. Do you think it might be beneficial to check the old value of _is_old before assigning to it? Would cause less memory traffic, if increment_count() is called frequently.
60 void increment_count() {
61 _sample_count++;
62 if (!_is_old && _sample_count > OLD_THRESHOLD) {
63 _is_old = true;
64 }
65 }
igor
On Apr 24, 2012, at 8:01 AM, Mikael Vidstedt wrote:
>
> Hi all,
>
> The statistical counters in gcUtil are used to keep track of historical information about various key metrics in the garbage collectors. Built in to the core AdaptiveWeightedAverage base class is the concept of aging the values, essentially treating the first 100 values differently and putting more weight on them since there's not yet enough historical data built up.
>
> In the class there is a 32-bit counter (_sample_count) that incremented for every sample and used to compute scale the weight of the added value (see compute_adaptive_average), and the scaling logic divides 100 by the count. In the normal case this is not a problem - the counters are reset every once in a while and/or grow very slowly. In some pathological cases the counter will however continue to increment and eventually overflow/wrap, meaning the 32-bit count will go back to zero and the division in compute_adaptive_average will lead to a div-by-zero crash.
>
> The test case where this is observed is a test that stress tests allocation in combination with the GC locker. Specifically, the test is multi-threaded which pounds on java.util.zip.Deflater.deflate, which internally uses the GetPrimitiveArrayCritical JNI function to temporarily lock out the GC (using the GC locker). The garbage collector used is in this case the parallel scavenger and the the counter that overflows is _avg_pretenured. _avg_pretenured is incremented/sampled every time an allocation is made directly in the old gen, which I believe is more likely when the GC locker is active.
>
>
> The suggested fix is to only perform the division in compute_adaptive_average when it is relevant, which currently is for the first 100 values. Once there are more than 100 samples there is no longer a need to scale the weight.
>
> This problem is tracked in 7158457 (stress: jdk7 u4 core dumps during megacart stress test run).
>
> Please review and comment on the webrev below:
>
> http://cr.openjdk.java.net/~mikael/7158457/webrev.00
>
> Thanks,
> Mikael
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120424/a9b52863/attachment.htm>
More information about the hotspot-gc-dev
mailing list