RFR(S): 7190666: G1: assert(_unused == 0) failed: Inconsistency in PLAB stats
John Cuthbertson
john.cuthbertson at oracle.com
Tue Aug 28 20:19:14 UTC 2012
Hi Jesper,
Thanks for the review. The additional info was very helpful and, since
_allocated, _wasted, and _unused are updated together, led me to the
overflow conclusion.
We have a CR to extend various asserts and guarantees in G1 with more
descriptive error messages (6949301). One of the engineers at Twitter
has volunteered to lead the effort.
Thanks,
JohnC
On 08/28/12 00:44, Jesper Wilhelmsson wrote:
> Looks good!
>
> I especially like that you added more information to the assert
> message. That will be helpful the next time the assert triggers.
> /Jesper
>
> On 2012-08-28 02:36, John Cuthbertson wrote:
>> Hi Everyone,
>>
>> Can I have a couple of volunteers review the changes for this CR? The
>> webrev can be found at:
>> http://cr.openjdk.java.net/~johnc/7190666/webrev.0/
>>
>> Summary:
>> The value of PLABStats::_allocated was overflowing and the failed
>> assertion detected when it overflowed to 0. When we retired an
>> individual allocation buffer, we were flushing some accumulated
>> values to the associated PLABStats instance. This was artificially
>> inflating the values in the PLABStats instance since we were not
>> reseting the accumulated values in the ParGCAllocBuffer after we
>> flushed. Ordinarily this would not cause an issue (other than the
>> values being too large) but with this particular test case we
>> obtained an evacuation failure. As a result we were executing the GC
>> allocation slow-path, and flushing the accumulated values, for every
>> failed attempted object allocation (even though we were unable to
>> allocate a new buffer), and we overflowed. Reseting the sensor values
>> in the ParGCAllocBuffer instance after flushing prevents the
>> artificial inflation and overflow.
>>
>> Additionally we should not be flushing the values to the PLABStats
>> instance on every buffer retirement (though it is not stated in the
>> code). Flushing the stats values on every retirement is unnecessary
>> and, in the case of an evacuation, adds a fair amount of additional
>> work for each failed object copy. Instead we should only be flushing
>> the accumulated sensor values when we retire the final buffers prior
>> to disposing the G1ParScanThreadState object.
>>
>> Testing:
>> The failing test case; the GC test suite with +PrintPLAB, and jprt.
>>
>> Note while testing this I ran into some assertion and guarantee
>> failures from G1's block offset table. I've only seen and been able
>> (so far) to reproduce these failures on a single machine in the jprt
>> pool. I will be submitting a new CR for these failures. I do not
>> believe that the failures are related to this fix (or the change that
>> enabled resize-able PLABS) as I've been able to reproduce the
>> failures with disabling ResizePLAB and setting OldPLABSize=8k, 16k,
>> and 32k.
>>
>> Thanks,
>>
>> JohnC
>
More information about the hotspot-gc-dev
mailing list