RFR(S): 7190666: G1: assert(_unused == 0) failed: Inconsistency in PLAB stats

John Cuthbertson john.cuthbertson at oracle.com
Tue Aug 28 20:19:14 UTC 2012


Hi Jesper,

Thanks for the review. The additional info was very helpful and, since 
_allocated, _wasted, and _unused are updated together,  led me to the 
overflow conclusion.

We have a CR to extend various asserts and guarantees in G1 with more 
descriptive error messages (6949301). One of the engineers at Twitter 
has volunteered to lead the effort.

Thanks,

JohnC

On 08/28/12 00:44, Jesper Wilhelmsson wrote:
> Looks good!
>
> I especially like that you added more information to the assert 
> message. That will be helpful the next time the assert triggers.
> /Jesper
>
> On 2012-08-28 02:36, John Cuthbertson wrote:
>> Hi Everyone,
>>
>> Can I have a couple of volunteers review the changes for this CR? The 
>> webrev can be found at: 
>> http://cr.openjdk.java.net/~johnc/7190666/webrev.0/
>>
>> Summary:
>> The value of PLABStats::_allocated was overflowing and the failed 
>> assertion detected when it overflowed to 0. When we retired an 
>> individual allocation buffer, we were flushing some accumulated 
>> values to the associated PLABStats instance. This was artificially 
>> inflating the values in the PLABStats instance since we were not 
>> reseting the accumulated values in the ParGCAllocBuffer after we 
>> flushed. Ordinarily this would not cause an issue (other than the 
>> values being too large) but with this particular test case we 
>> obtained an evacuation failure. As a result we were executing the GC 
>> allocation slow-path, and flushing the accumulated values, for every 
>> failed attempted object allocation (even though we were unable to 
>> allocate a new buffer), and we overflowed. Reseting the sensor values 
>> in the ParGCAllocBuffer instance after flushing prevents the 
>> artificial inflation and overflow.
>>
>> Additionally we should not be flushing the values to the PLABStats 
>> instance on every buffer retirement (though it is not stated in the 
>> code). Flushing the stats values on every retirement is unnecessary 
>> and, in the case of an evacuation, adds a fair amount of additional 
>> work for each failed object copy. Instead we should only be flushing 
>> the accumulated sensor values when we retire the final buffers prior 
>> to disposing the G1ParScanThreadState object.
>>
>> Testing:
>> The failing test case; the GC test suite with +PrintPLAB, and jprt.
>>
>> Note while testing this I ran into some assertion and guarantee 
>> failures from G1's block offset table. I've only seen and been able 
>> (so far) to reproduce these failures on a single machine in the jprt 
>> pool. I will be submitting a new CR for these failures. I do not 
>> believe that the failures are related to this fix (or the change that 
>> enabled resize-able PLABS) as I've been able to reproduce the 
>> failures with disabling ResizePLAB and setting OldPLABSize=8k, 16k, 
>> and 32k.
>>
>> Thanks,
>>
>> JohnC
>




More information about the hotspot-gc-dev mailing list