Discussion: improve humongous objects handling for G1

Sat Jan 18 04:08:05 UTC 2020

Thanks for the in-depth responses!

For a sample application, I actually have a modified BigRamTester that
allocates humongous objects, and it can demonstrate some of the problems.
Would JDK-8204689 be addressed soon? Then we can merge the variants of
BigRamTester.
A possible concern is that the "humongous BigRamTester" is not
representative of the production workload's problem with humongous objects.
The humongous objects in production workload are more likely short-lived,
whereas they are long-lived in "humongous BigRamTester". Perhaps we can
modify it further to make it the humongous objects short-lived. I will keep
this topic on my radar and see if I can find more realistic benchmarks.

For OOMs due to fragmentation and ideas related to full GC (JDK-8191565,
JDK-8038487), I'd like to point out that the near-OOM cases are less of a
concern for our production applications. Their heap sizes are sufficiently
large in order to keep GC overhead low with CMS in the past. When they move
to G1, they almost never trigger full GCs even with a non-trivial number of
humongous allocations.
The problem is the high frequency of concurrent cycles and mixed
collections as a result of humongous allocations. Fundamentally it is also
due to fragmentation, but only addressing the near-OOM cases would not
solve the problem. Doing more active defragmentation could indeed help.

It might be better to first fully explore the feasibilities of those
crazier ideas. If one of them works, then we don't need to continuously
improve G1 here and there. So far there are 3 of them. They all can get rid
of humongous regions completely if I understand correctly.
a. let G1 reserve a multiple of MaxHeapSize while only ever committing
MaxHeapSize (JDK-8229373)
    I like this approach most, especially since JDK-8211425 is already
implemented. I'll further think about the issue with compressed oops.
b. break down large objects into smaller ones like J9's arraylets
    A few questions on this approach:
    We probably don't need to handle large non-array objects, right? They
should be extremely rare.
    Is this approach compliant with JLS [1] and JVMS [2]? I read about them
but couldn't find evidence of noncompliance.
    Supporting JNI GetCritical does look tricky. Another tricky issue is
that we should preserve O(1) complexity for accesses by index.
c. carving out the adjustable subset of regions for humongous allocs and
doing power-of-two buddy-system allocation
    I have also thought about a quite similar idea by introducing a
dynamic-sized humongous space. It might be better to support multiple
dynamic-sized humongous spaces. I admit I probably have not thought this
approach as deep as Aleksey has.

[1] https://docs.oracle.com/javase/specs/jls/se13/html/jls-10.html
[2]
https://docs.oracle.com/javase/specs/jvms/se13/html/jvms-6.html#jvms-6.5.newarray

-Man

On Fri, Jan 17, 2020 at 2:01 AM Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

> Hi,
>
> On 17.01.20 01:53, Man Cao wrote:
> > Hi all,
> >
> > While migrating our workload from CMS to G1, we found many production
> > applications suffer from humongous allocations.
> > The default threshold for humongous objects is often too small for our
> > applications with heap sizes between 2GB-15GB.
> > Humongous allocations caused noticeable increase in the frequency of
> > concurrent old-gen collections, mixed collections and CPU usage.
> > We could advise applications to increase G1HeapRegionSize. But some
> > applications still suffer with G1HeapRegionSize=32M.
> > We could also advise applications to refactor code to break down large
> > objects. But it is a high cost effort that may not always be feasible.
> >
> > We'd like to work with the OpenJDK community together to improve G1's
> > handling of humongous objects.
> > Thomas Schatzl mentioned to me a few efforts/ideas on this front in an
> > offline chat:
> > a. Allocation into tail regions of humongous object: JDK-8172713,
> > JDK-8031381
> > b. Commit additional virtual address space for humongous objects.
> > c. Improve the region selection heuristics (e.g., first-fit, best-fit)
> for
> > humongous objects.
> >
> > I didn't find open CRs for b. and c. Could someone give pointers?
> > Are there any other ideas/prototypes on this front?
>
> TLDR: we in the Oracle gc team have quite a few ideas that can decrease
> the issue significantly. We are happy to help with implementation of any
> of these.
> We would appreciate a sample application.
>
> Long version:
>
> The problems with humongous object allocation in G1:
>
> - internal fragmentation: the tail end of a humongous object is wasted
> space.
>
> - external fragmentation: sometimes you can't find enough contiguous
> space for a humongous object.
>
> There are quite a few CRs related to this problem in the bug tracker; I
> just now connected them together using a "g1-humongous" label [0].
>
> Here's a rundown of our ideas, categorized a little (note that these CRs
> predate significant changes due to how G1 works now, so the ideas may
> need to be adapted to the current situation):
>
> - try to get rid of humongous asap, i.e. improve eager reclaim support
> by allowing eager reclaim with reference arrays (JDK-8048180) or
> non-objArrays (JDK-8073288).
> I remember the main problem with that were stale remembered set entries
> after removal (and SATB marking, but you could just not do eager reclaim
> during marking).
> In the applications we had at hand at that time, reference arrays tended
> to be not eager reclaimable most of the time, and humongous regular
> objects were rare.
> So the benefit to look into this might be small.
>
> - allow allocation into the tail end of humongous objects (JDK-8172713);
> there has once been an internal prototype for that, but it has been
> abandoned because of implementation issues (it was a hack that has not
> been completed to a stable state, mainly because humongous object
> management had been full of odd quirks wrt to region management. This
> has been fixed since. Also the example application benefitted more from
> eager reclaim).
>
> While the argument from Aleksey about nepotism in the other thread is
> valid (as far as I understand it), it depends on the implementation. The
> area at the tail end could be considered as a separate evacuation
> source, i.e. evacuated independently of the humongous object (and that
> would actually improve the code to clean out HeapRegion ;)).
> (This needs more care with single-region humongous objects but does not
> seem completely problematic; single-region humongous objects may
> nowadays not be a big issue to just move during GC).
>
> - external fragmentation can be approached in many ways:
>
>    - or just ignored by letting G1 reserve a multiple of MaxHeapSize
> while only ever committing MaxHeapSize (JDK-8229373). The main drawback
> here is that it impacts the range of heaps where compressed oops can be
> used, and 32 bit (particularly Windows) VMs (if you still care, but the
> feature could be disabled as well).
> Compressed oops typically improve throughput significantly. Of course,
> as long as the total size of the reservation is below the threshold, it
> does not really matter.
>
> Fwiw, when using the HeterogeneousHeapRegionManager, this is already
> attempted (for other reasons).
>
>    - improve the region allocator to decrease the problem (JDK-8229373).
> The way G1 currently allocates regions is a first-fit approach which
> interferes a bit with destination region selection for old and survivor
> regions, likely creating more fragmentation than necessary. (Basically:
> it does not care at all, so go figure ;) ).
> Also during mixed gc one could explicitly prefer regions to evacuate
> that break long runs of free regions, weighing those regions higher
> (evacuating earlier). This needs to be done in conjunction with the
> remembered set selection at end of marking, before creating them.
>
> Long time ago, on a different regional collector, I started looking into
> this.
>
>    - actively defragment the heap during GC. This may either be full gc
> (JDK-8191565) like shenandoah does, or any young gc assuming that G1
> first kept remembered sets for potential candidates (JDK-8038487).
>
> - never create humongous objects
>
>    - potentially implement one of the various ideas in the literature to
> break down large objects into smaller ones, J9's arraylets being one of
> them.
>
> There are other solutions like completely separate allocation of
> humongous objects like ZGC does, but that typically has the same problem
> as reserving more space (i.e. compressed oops range, but ZGC does not
> care at this time).
>
> I think it would help potential contributors if there were some
> application available where the impact of changes could be shown on in
> some way. In the past, whenever there had been someone with that
> problem, these persons were happy to just increase heap region size -
> which is great for them, but does not fix the problem :)
>
> We would in any case help anyone taking a stab of one of these ideas (or
> others).
>
> Thanks,
>    Thomas
>
> [0]
>
> https://bugs.openjdk.java.net/browse/JDK-8237466?jql=labels%20%3D%20g1-humongous
>