RFR(S): 7190666: G1: assert(_unused == 0) failed: Inconsistency in PLAB stats

Wed Aug 29 17:23:10 UTC 2012

Hi Ramki,

Thanks for the review.

I don't think we need to back port the change in PLABStats. CMS/ParNew 
was only flushing the accumulated values once rather than at every 
allocation buffer retirement and wasn't affected by the "double" add 
that inflated the value. The real fix for G1 was to do the same. The 
only reason for the change in PLABStats was to make it a bit more robust 
so that we could flush the stats every n-th buffer retirement if we wanted.

I haven't run performance numbers yet - I've been distracted with the 
assertions and guarantee failures from G1's BOT that I've seen with a 
jprt-built binary. It is a good idea to run refworkload. My expectation 
is that ParNew/CMS will be a wash; with G1 I would expect the deviation 
to narrow.

Thanks,

JohnC

On 8/29/2012 1:43 AM, Srinivas Ramakrishna wrote:
> Hi John --
>
> Good catch... I suppose we have been getting too large PLABs for so 
> many years and perhaps wasting space to fragmentation as a result of 
> this so far (hmm.. although perhaps the waste calculation in the next 
> cycle allows us to adjust down, but i guess we are prone to oscillate 
> because of the sensor value corruption...) I can imagine that in the 
> case of ParNew+CMS this has been wasting space (because of said 
> oscillations) in the survivor spaces.
>
> Thanks for the fix, and it looks good to me too. Any chance that the 
> PLAB stats portion of the fix at least might also be backported to JDK 
> 7, so the performance benefit of a more correct calculation accrues 
> there as well? May be under an MR? (PS: I am curious to know if this 
> showed any change in (gc) performance on any of the usual benchmarks...)
>
> thanks!
> -- ramki
>
> On Tue, Aug 28, 2012 at 1:20 PM, John Cuthbertson 
> <john.cuthbertson at oracle.com <mailto:john.cuthbertson at oracle.com>> wrote:
>
>     Hi Jon,
>
>     Thanks for the review.
>
>     JohnC
>
>
>     On 08/28/12 10:00, Jon Masamitsu wrote:
>
>         Looks good.
>
>         On 8/27/2012 5:36 PM, John Cuthbertson wrote:
>
>             Hi Everyone,
>
>             Can I have a couple of volunteers review the changes for
>             this CR? The webrev can be found at:
>             http://cr.openjdk.java.net/~johnc/7190666/webrev.0/
>             <http://cr.openjdk.java.net/%7Ejohnc/7190666/webrev.0/>
>
>             Summary:
>             The value of PLABStats::_allocated was overflowing and the
>             failed assertion detected when it overflowed to 0. When we
>             retired an individual allocation buffer, we were flushing
>             some accumulated values to the associated PLABStats
>             instance. This was artificially inflating the values in
>             the PLABStats instance since we were not reseting the
>             accumulated values in the ParGCAllocBuffer after we
>             flushed. Ordinarily this would not cause an issue (other
>             than the values being too large) but with this particular
>             test case we obtained an evacuation failure. As a result
>             we were executing the GC allocation slow-path, and
>             flushing the accumulated values, for every failed
>             attempted object allocation (even though we were unable to
>             allocate a new buffer), and we overflowed. Reseting the
>             sensor values in the ParGCAllocBuffer instance after
>             flushing prevents the artificial inflation and overflow.
>
>             Additionally we should not be flushing the values to the
>             PLABStats instance on every buffer retirement (though it
>             is not stated in the code). Flushing the stats values on
>             every retirement is unnecessary and, in the case of an
>             evacuation, adds a fair amount of additional work for each
>             failed object copy. Instead we should only be flushing the
>             accumulated sensor values when we retire the final buffers
>             prior to disposing the G1ParScanThreadState object.
>
>             Testing:
>             The failing test case; the GC test suite with +PrintPLAB,
>             and jprt.
>
>             Note while testing this I ran into some assertion and
>             guarantee failures from G1's block offset table. I've only
>             seen and been able (so far) to reproduce these failures on
>             a single machine in the jprt pool. I will be submitting a
>             new CR for these failures. I do not believe that the
>             failures are related to this fix (or the change that
>             enabled resize-able PLABS) as I've been able to reproduce
>             the failures with disabling ResizePLAB and setting
>             OldPLABSize=8k, 16k, and 32k.
>
>             Thanks,
>
>             JohnC
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120829/4251c79a/attachment.htm>