RFR(S): 7190666: G1: assert(_unused == 0) failed: Inconsistency in PLAB stats

Wed Aug 29 08:43:33 UTC 2012

Hi John --

Good catch... I suppose we have been getting too large PLABs for so many
years and perhaps wasting space to fragmentation as a result of this so far
(hmm.. although perhaps the waste calculation in the next cycle allows us
to adjust down, but i guess we are prone to oscillate because of the sensor
value corruption...) I can imagine that in the case of ParNew+CMS this has
been wasting space (because of said oscillations) in the survivor spaces.

Thanks for the fix, and it looks good to me too. Any chance that the PLAB
stats portion of the fix at least might also be backported to JDK 7, so the
performance benefit of a more correct calculation accrues there as well?
May be under an MR? (PS: I am curious to know if this showed any change in
(gc) performance on any of the usual benchmarks...)

thanks!
-- ramki

On Tue, Aug 28, 2012 at 1:20 PM, John Cuthbertson <
john.cuthbertson at oracle.com> wrote:

> Hi Jon,
>
> Thanks for the review.
>
> JohnC
>
>
> On 08/28/12 10:00, Jon Masamitsu wrote:
>
>> Looks good.
>>
>> On 8/27/2012 5:36 PM, John Cuthbertson wrote:
>>
>>> Hi Everyone,
>>>
>>> Can I have a couple of volunteers review the changes for this CR? The
>>> webrev can be found at: http://cr.openjdk.java.net/~**
>>> johnc/7190666/webrev.0/<http://cr.openjdk.java.net/%7Ejohnc/7190666/webrev.0/>
>>>
>>> Summary:
>>> The value of PLABStats::_allocated was overflowing and the failed
>>> assertion detected when it overflowed to 0. When we retired an individual
>>> allocation buffer, we were flushing some accumulated values to the
>>> associated PLABStats instance. This was artificially inflating the values
>>> in the PLABStats instance since we were not reseting the accumulated values
>>> in the ParGCAllocBuffer after we flushed. Ordinarily this would not cause
>>> an issue (other than the values being too large) but with this particular
>>> test case we obtained an evacuation failure. As a result we were executing
>>> the GC allocation slow-path, and flushing the accumulated values, for every
>>> failed attempted object allocation (even though we were unable to allocate
>>> a new buffer), and we overflowed. Reseting the sensor values in the
>>> ParGCAllocBuffer instance after flushing prevents the artificial inflation
>>> and overflow.
>>>
>>> Additionally we should not be flushing the values to the PLABStats
>>> instance on every buffer retirement (though it is not stated in the code).
>>> Flushing the stats values on every retirement is unnecessary and, in the
>>> case of an evacuation, adds a fair amount of additional work for each
>>> failed object copy. Instead we should only be flushing the accumulated
>>> sensor values when we retire the final buffers prior to disposing the
>>> G1ParScanThreadState object.
>>>
>>> Testing:
>>> The failing test case; the GC test suite with +PrintPLAB, and jprt.
>>>
>>> Note while testing this I ran into some assertion and guarantee failures
>>> from G1's block offset table. I've only seen and been able (so far) to
>>> reproduce these failures on a single machine in the jprt pool. I will be
>>> submitting a new CR for these failures. I do not believe that the failures
>>> are related to this fix (or the change that enabled resize-able PLABS) as
>>> I've been able to reproduce the failures with disabling ResizePLAB and
>>> setting OldPLABSize=8k, 16k, and 32k.
>>>
>>> Thanks,
>>>
>>> JohnC
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120829/3b95b0c4/attachment.htm>