Discussion: improve humongous objects handling for G1

Sat Jan 18 10:13:31 UTC 2020

Hi,

On Fri, 2020-01-17 at 20:08 -0800, Man Cao wrote:
> Thanks for the in-depth responses!
> 
> For a sample application, I actually have a modified BigRamTester
> that allocates humongous objects, and it can demonstrate some of the
> problems.
> Would JDK-8204689 be addressed soon? Then we can merge the variants 

Given previous track record on that, unfortunately not.

> of BigRamTester. A possible concern is that the "humongous
> BigRamTester" is not representative of the production workload's
> problem with humongous objects.
> The humongous objects in production workload are more likely short-
> lived, whereas they are long-lived in "humongous BigRamTester". 

For short-lived humongous objects eager reclaim can do miracles. If
your objects are non-objArrays, you could check for the reason why they
are not eagerly reclaimed - maybe the threshold for the amount of
remembered set entries to keep these humongous objects as eligible for
eager reclaim is too low, and increasing that one would just make it
work. Enabling gc+humongous=debug can give more information.

Note that in JDK13 we (implicitly) increased this threshold, and in
JDK14 we removed the main reason why the threshold is as low as it is
(calculating the number of rememebered set entries).

It is likely possible to increase this threshold by one or even two
magnitudes now, potentially increasing its effectiveness significantly
with a one-liner change. I will file a CR for that, thought of it but
forgot when doing the jdk14 modification.

> Perhaps we can modify it further to make it the humongous objects
> short-lived. I will keep this topic on my radar and see if I can find
> more realistic benchmarks.
> 
> For OOMs due to fragmentation and ideas related to full GC (JDK-
> 8191565, JDK-8038487), I'd like to point out that the near-OOM cases
> are less of a concern for our production applications. Their heap
> sizes are sufficiently large in order to keep GC overhead low with
> CMS in the past. When they move to G1, they almost never trigger full
> GCs even with a non-trivial number of humongous allocations.
> The problem is the high frequency of concurrent cycles and mixed
> collections as a result of humongous allocations. Fundamentally it is

Which indicates that eager reclaim does not work in this application
for some reason.

> also due to fragmentation, but only addressing the near-OOM cases
> would not solve the problem. Doing more active defragmentation could
> indeed help.

To me, spending the effort on combating internal fragmentation (allow
allocation in tail ends) and external fragmentation by actively
defragmenting seems to be at least worth comparing to other options.

It could help with all problems but cases where you allocate a very
large of humongous objects and you can't keep the humognous object
tails filled. This option still keeps the invariant that humongous
objects need to be allocated at a region boundary.

Most of the other ideas you propose below also (seem to) retain this
property.

> It might be better to first fully explore the feasibilities of those
> crazier ideas. If one of them works, then we don't need to
> continuously improve G1 here and there. So far there are 3 of them.
> They all can get rid of humongous regions completely if I understand
> correctly.
> a. let G1 reserve a multiple of MaxHeapSize while only ever
> committing MaxHeapSize (JDK-8229373)
>     I like this approach most, especially since JDK-8211425 is
> already implemented. I'll further think about the issue with
> compressed oops.

It is simplest, but does not solve the issue with internal
fragmentation which is ultimately responsible for concurrent cycle
frequency.

Maybe it is sufficient as "most" applications only use single or low
double-digit GB heaps at the moment where the entire reservation still
fits into the 32gb barrier.

If the heap is already larger than the compressed oops range, then this
solution would certainly be simplest for the external fragmentation
issue. If you are already way beyond that barrier, you might just use
ZGC though for other reasons too if you are fine with any potential
throughput hit.

> b. break down large objects into smaller ones like J9's arraylets
>     A few questions on this approach:
>     We probably don't need to handle large non-array objects, right?
> They should be extremely rare.

Arraylets do not solve that problem either.

>     Is this approach compliant with JLS [1] and JVMS [2]? I read
> about them but couldn't find evidence of noncompliance.

I do not think there is an issue but I did not specifically read the
specs again. Given that J9 is spec compliant afaik when they use
arraylets (with the default balanced collector), so would Hotspot.

>     Supporting JNI GetCritical does look tricky. Another tricky issue

You could double-map like 
https://blog.openj9.org/2019/05/01/double-map-arraylets/ does for
native access.

Btw the same text also indicates that copying seems like a non-starter
anyway, as, quoting from the text "One use case, SPECjbb2015 benchmark
is not being able to finish RT curve...".

> is that we should preserve O(1) complexity for accesses by index.

Not sure what prevents arraylets in particular from being O(1); a
particular access is slower though due to the additional indirection
with the spine.

Using the double-mapped array for JITted code may have the same problem
with compressed oops as other solutions; particularly if you do not
know the size of the processed array in advance, you need to create
extra code.

Which means that there is significant optimization work needed to make
array access "as fast" as before in jitted code.

> c. carving out the adjustable subset of regions for humongous allocs
> and doing power-of-two buddy-system allocation

The buddy system (as I understand it, maybe Aleksey could share more
details) still suffers from internal fragmentation, potentially even
more than now.

>     I have also thought about a quite similar idea by introducing a
> dynamic-sized humongous space. It might be better to support multiple
> dynamic-sized humongous spaces. I admit I probably have not thought
> this approach as deep as Aleksey has.

This is the approach ZGC takes, which has the associated problems with
compressed oops.
I do not think we can completely give up the compressed oops use case
at least until alternatives are explored.

> 
> [1] https://docs.oracle.com/javase/specs/jls/se13/html/jls-10.html
> [2]
> 
https://docs.oracle.com/javase/specs/jvms/se13/html/jvms-6.html#jvms-6.5.newarray
> 

Thanks,
  Thomas