RFR: JDK-8078904 : CMS: Assert failed: Ctl pt invariant

Thu May 28 17:01:33 UTC 2015

Hi Thomas,
OK after our IM yesterday and a few more experiments I changed this to 
use a fixed divisor 2*K (currently the default MinTLABSize) for setting 
up the stripes. I think this is the simplest fix that preserves the 
intent of the plab_sample_minimum_size() that was there but avoids being 
susceptible to unusual MinTLABSize options.

  http://cr.openjdk.java.net/~ecaspole/JDK-8078904/02/webrev/

There is a mess with Young/OldPLABSize vs unusual MinTLABSize options 
but I think we should tackle that as a separate bug, since that affects 
both G1 and CMS.
Thanks,
Eric

On 05/19/2015 07:30 AM, Thomas Schatzl wrote:
> Hi Eric,
>
> On Wed, 2015-05-13 at 15:17 -0400, Eric Caspole wrote:
>> Hi everybody,
>> Could I have a review for fixing JDK-8078904. That is an assert in CMS
>> where the setup of the survivor chunk array used for setting up the CMS
>> rescan did not completely scan all the per-thread chunk arrays. This
>> would happen if the TLAB size is set very large on the cmd line, for
>> example, because of the way the survivor chunk array structures were set
>> up. In product builds this would result in uneven distribution of
>> parallel work where the last task might get 100x as much region to scan
>> as the others.
>>
>> The size of those PLABs depends on YoungPLABSize so I changed the setup
>> code to only consider the YoungPLABSize when creating the survivor chunk
>> array. I can see with this change there is even distribution of parallel
>> work in the rescan tasks with various YoungPLABSize and ParallelGCThread
>> count.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8078904
>>
>> http://cr.openjdk.java.net/~ecaspole/JDK-8078904/00/webrev/
>
> The problem is not the uneven distribution of the work - there is
> already enough potential in VM options to shoot yourselves into your own
> feet, so another does not really hurt imo.
>
> Also, the actual PLAB size might go below YoungPLABSize due to resizing
> in both CMS and G1(!), so the size of the array will not be sufficient
> anyway, leading to the same problem, just not at startup.
>
> The issue is that YoungPLABSize and PLAB[Stats]::min_size() (e.g.
> MinTLABSize) are not synchronized.
>
> I.e. with this change, if you set YoungPLABSize to something larger than
> MinTLABSize, the arrays are not sufficient.
> Without this change, the same error occurs if YoungPLABSize is set to
> something smaller than MinTLABSize.
>
> The only solution to prevent either problem is to guarantee that at
> initialization, before the first PLABs are allocated, YoungPLABSize (and
> also OldPLABSize!) is within [PLAB::min_size(), PLAB::max_size()] (not
> sure about whether the upper bound can be inclusive, please check), and
> simply use PLAB[Stats]::min_size() as
> CMSCollector::plab_sample_minimum_size().
>
> Also, PLAB::desizred_sz() should always return a value within
> PLAB::min_size(), PLAB::max_size()].
>
> The change should also verify that PLAB::max_size() >= PLAB::min_size().
>
> Thanks,
>    Thomas
>
>