Fwd: RFC: TLAB allocation and garbage-first policy

Wed Sep 20 16:18:31 UTC 2017

---------- Forwarded message ----------
From: Christine Flood <cflood at redhat.com>
Date: Wed, Sep 20, 2017 at 12:00 PM
Subject: Re: RFC: TLAB allocation and garbage-first policy
To: Aleksey Shipilev <shade at redhat.com>

On Wed, Sep 20, 2017 at 10:49 AM, Aleksey Shipilev <shade at redhat.com> wrote:

> On 09/20/2017 03:53 PM, Christine Flood wrote:
> > On Wed, Sep 20, 2017 at 8:58 AM, Aleksey Shipilev <shade at redhat.com
> <mailto:shade at redhat.com>> wrote:
> > The original TLabs were something like 4K if I remember correctly.  Yes,
> there is a balancing act
> > with not making them too small.  However what's the point of having
> TLABS if they are region sized,
> > why not just assign a region/thread?
>
> For the same reason adaptive TLAB sizing exists: do not waste space. In
> this mechanics, you can't
> have more threads than regions, even if most of the threads are dormant.
>
> > Perhaps there's a middle ground of say 1/4 of a region which will leave
> you in a better situation than you are now.
>
> Maybe, let's make it our fallback plan: if everything else fails, we can
> trim down the TLABs.
>
>
> >     > Another potential solution would be to treat these regions
> specially.  When a tlab allocation fails
> >     > in a region we could fill that particular region with a filler
> array.  Therefore we now have
> >     > garbage.  This differs from your solution in that regular regions
> that are perfectly happy with
> >     > normal sized tlab spaces available aren't going to get prematurely
> compacted.
> >
> >     Aha, sounds interesting. So the only thing that does seem to help is
> the half-full regions we never
> >     tried to allocate in, right? Otherwise it is the same as looking at
> "live" for cset selection.
> >
> > What this does is not confuse our metrics for expediency.  We agreed
> earlier that in a single region
> > case copying  the live data from one region to another doesn't gain us
> anything.  I would argue that
> > if I had to choose between compacting 10 fragmented regions or 10
> compacted but not large enough for
> > a tlab regions we would be better off compacting the fragmented regions
> because that would leave us
> > with more contiguous free space.  Your proposed metric doesn't
> distinguish between the two.
> >
> > I suppose there is a place for what you want.  If all I have left are
> compacted but not quite
> > spacious enough regions I would prefer to add them to the cset instead
> of falling back on a full
> > gc.  Perhaps there's a heuristic that satisfies both constraints.
>
> The example of the single fragmented region, while valid in itself, loses
> sight of bigger picture, I
> think. It seems to me that it *only* matters when collection set contains
> that one fragmented
> region. If it contains more than one fragmented region, then it starts to
> make sense to compact them
> together and free up one of regions. If cset contains additional full
> regions, then the impact of
> "wasteful" copy for that single fragmented region is very low.
>
> With that in sight, how frequent it is to have a single fragmented region
> in the collection set,
> compared to other cases? I would rather have the static code that deals
> with 99.99% of the cases,
> and never walks into bad feedback loop, than having another heuristics for
> 0.01% of the cases and
> fails with unforeseen feedbacks. Doing the heuristics feels like what you
> describe as, "<chf> I will
> grant you that in your particular situation your solution looks
> attractive, but in a myriad of other
> situations you are actually pessimizing GC performance in at least one
> metric".
>

How frequent is it that we end up with 10% full regions which can't fit a
tlab?  You are proposing changing a metric which will no longer distinguish
between regions which are 100% full with 10% live data and regions which
are 10% full with 10% live data.  I would argue that those two situations
should be treated differently most of the time.  In fact in the situations
where we don't hit this tlab bogosity race condition there is no reason to
copy that 10% full 10% live data.

I suspect your proposed metric change would result in most current
allocation regions being included in the collection set.  This results not
just in unnecessary copying work, but presumably more triggered write
barriers because these most recently allocated objects are presumably also
more likely to be accessed in the near future.

Christine