RFR(S): 7190666: G1: assert(_unused == 0) failed: Inconsistency in PLAB stats

Thu Aug 30 07:51:16 UTC 2012

On Wed, Aug 29, 2012 at 10:23 AM, John Cuthbertson <
john.cuthbertson at oracle.com> wrote:

> **
> Hi Ramki,
>
> Thanks for the review.
>
> I don't think we need to back port the change in PLABStats. CMS/ParNew was
> only flushing the accumulated values once rather than at every allocation
> buffer retirement and wasn't affected by the "double" add that inflated the
> value.
>

ah, you are right ...

> The real fix for G1 was to do the same. The only reason for the change in
> PLABStats was to make it a bit more robust so that we could flush the stats
> every n-th buffer retirement if we wanted.
>

Makes sense, and sorry for my confusion.

>
> I haven't run performance numbers yet - I've been distracted with the
> assertions and guarantee failures from G1's BOT that I've seen with a
> jprt-built binary. It is a good idea to run refworkload. My expectation is
> that ParNew/CMS will be a wash; with G1 I would expect the deviation to
> narrow.
>

I'd agree wrt parnew/cms, now that i understand the code better in light of
your correction of my confusion.

thanks!
-- ramki

>
> Thanks,
>
> JohnC
>
>
> On 8/29/2012 1:43 AM, Srinivas Ramakrishna wrote:
>
> Hi John --
>
> Good catch... I suppose we have been getting too large PLABs for so many
> years and perhaps wasting space to fragmentation as a result of this so far
> (hmm.. although perhaps the waste calculation in the next cycle allows us
> to adjust down, but i guess we are prone to oscillate because of the sensor
> value corruption...) I can imagine that in the case of ParNew+CMS this has
> been wasting space (because of said oscillations) in the survivor spaces.
>
> Thanks for the fix, and it looks good to me too. Any chance that the PLAB
> stats portion of the fix at least might also be backported to JDK 7, so the
> performance benefit of a more correct calculation accrues there as well?
> May be under an MR? (PS: I am curious to know if this showed any change in
> (gc) performance on any of the usual benchmarks...)
>
> thanks!
> -- ramki
>
> On Tue, Aug 28, 2012 at 1:20 PM, John Cuthbertson <
> john.cuthbertson at oracle.com> wrote:
>
>> Hi Jon,
>>
>> Thanks for the review.
>>
>> JohnC
>>
>>
>> On 08/28/12 10:00, Jon Masamitsu wrote:
>>
>>> Looks good.
>>>
>>> On 8/27/2012 5:36 PM, John Cuthbertson wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> Can I have a couple of volunteers review the changes for this CR? The
>>>> webrev can be found at:
>>>> http://cr.openjdk.java.net/~johnc/7190666/webrev.0/
>>>>
>>>> Summary:
>>>> The value of PLABStats::_allocated was overflowing and the failed
>>>> assertion detected when it overflowed to 0. When we retired an individual
>>>> allocation buffer, we were flushing some accumulated values to the
>>>> associated PLABStats instance. This was artificially inflating the values
>>>> in the PLABStats instance since we were not reseting the accumulated values
>>>> in the ParGCAllocBuffer after we flushed. Ordinarily this would not cause
>>>> an issue (other than the values being too large) but with this particular
>>>> test case we obtained an evacuation failure. As a result we were executing
>>>> the GC allocation slow-path, and flushing the accumulated values, for every
>>>> failed attempted object allocation (even though we were unable to allocate
>>>> a new buffer), and we overflowed. Reseting the sensor values in the
>>>> ParGCAllocBuffer instance after flushing prevents the artificial inflation
>>>> and overflow.
>>>>
>>>> Additionally we should not be flushing the values to the PLABStats
>>>> instance on every buffer retirement (though it is not stated in the code).
>>>> Flushing the stats values on every retirement is unnecessary and, in the
>>>> case of an evacuation, adds a fair amount of additional work for each
>>>> failed object copy. Instead we should only be flushing the accumulated
>>>> sensor values when we retire the final buffers prior to disposing the
>>>> G1ParScanThreadState object.
>>>>
>>>> Testing:
>>>> The failing test case; the GC test suite with +PrintPLAB, and jprt.
>>>>
>>>> Note while testing this I ran into some assertion and guarantee
>>>> failures from G1's block offset table. I've only seen and been able (so
>>>> far) to reproduce these failures on a single machine in the jprt pool. I
>>>> will be submitting a new CR for these failures. I do not believe that the
>>>> failures are related to this fix (or the change that enabled resize-able
>>>> PLABS) as I've been able to reproduce the failures with disabling
>>>> ResizePLAB and setting OldPLABSize=8k, 16k, and 32k.
>>>>
>>>> Thanks,
>>>>
>>>> JohnC
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120830/b05042a3/attachment.htm>