RFR: Reduce waste in humongous allocations

Tue Dec 13 18:20:17 UTC 2016

Am Dienstag, den 13.12.2016, 19:09 +0100 schrieb Aleksey Shipilev:
> On 12/13/2016 06:11 PM, Roman Kennke wrote:
> > as Aleksey has shown, when repeatedly allocating humongous objects,
> > we
> > tend to leave gaps between them. The reason is that we start
> > looking
> > for contigous regions starting one region after the current
> > (allocation) region, and then discard that alloc region, starting a
> > new
> > one after the humongous object.
> > 
> > The fix is two-fold:
> > - Instead of discarding currently active allocation regions, we re-
> > append them to the free-list (together with any free regions that
> > we
> > skipped while searching a contiguous block). This should be useful,
> > e.g. when we have a not-totally-filled alloc region and then
> > allocate a
> > humongous object.
> > - When searching for contigous space, also consider the current
> > alloc
> > region. The complication here is that we must prevent concurrent
> > allocations from it. This patch does it by pre-emptively allocating
> > region-sized chunk, which has two effects: it blocks concurrent
> > allocations and it tells us if the region is free in a concurrency-
> > safe 
> > manner. If our search for contiguous block fails, we revert that by
> > freeing such regions again.
> > 
> > It passes jtreg tests and SPECjvm.
> > 
> > http://cr.openjdk.java.net/~rkennke/fixhumongousalloc/webrev.00/
> 
> Ugh. The code got even more confusing than it was before... At this
> point I
> wonder if acquiring a lock when claiming free regions is saner than
> trying to do
> this in a lock-free manner. With TLAB allocations, this shouldn't be
> that painful?

It's not painful in terms of performance, but painful in terms of
implemention. We cannot easily acquire the Heap_lock on allocations
because the allocation might come out of a write barrier, and that Java
thread is not-in-VM (they call into the VM via a cheap leaf-call). We
could change that (and have been there already) to use regular calls
like, e.g. allocations do, but this opens up a whole new class of other
problems. For example, we need oopmaps at write-barriers which, iirc,
presented us some serious optimization problems in C2 land. With
Roland's work, those might have gone away though (seems like we can
well live with control inputs to write barriers now..)

We have been there, and it might be The Correct Way to do it, but it's
not trivial at all.

> Seeing mutations in ShenandoahFreeSet::is_contiguous() makes me all
> itchy, it
> should be called differently.
> 
> Also, does the code claim the regions one-by-one? What if we have two
> competing
> multi-region humongous allocations? Does it guarantee to allocate
> both (e.g. are
> they stepping on each other's toes, preventing global progress?)

I guess it could happen. How else could we do it?

I know this stuff is a bit nightmarish. Accept that as stop-gap-
solution, and re-visit locked allocation with non-leaf-write-barriers
and all that stuff later?

Roman