Discussion: improve humongous objects handling for G1

Mon Jan 20 11:11:18 UTC 2020

Hi Liang,

On 19.01.20 08:08, Liang Mao wrote:
> Hi Guys,
> 
> We Alibaba have experienced the same problem as Man introduced.
> Some applications got frequent concurrent mark cycles and high
> cpu usage and even some to-space exhausted failures because of
> large amount of humongous object allocation even with
> G1HeapRegionSize=32m. But those applications worked fine
> with ParNew/CMS. We are working on some enhancements for better

Can you provide logs? (with gc+heap=debug,gc+humongous=debug)

> reclamation of humongous objects. Our first intention is to reduce
> the frequent concurrent cycles and possible to-space exhausted so
> the heap utility or arraylets are not taken into consideration yet.
> 
> Our solution is more like a ParNew/CMS flow and will treat a
> humongous object as young or old.
> 1. Humongous object allocation in mutator will be considered into
> eden size and won't directly trigger concurrent mark cycle. That
> will avoid the possible to-space exhausted while concurrent mark
> is working and humongous allocations are "eating" the free regions.

(I am trying to imagine situations here where this would be a problem 
since I do not have a log)

That helps if G1 is already trying to do a marking cycle if the space is 
tight and already eating into the reserve that has explicitly been set 
aside for this case (G1ReservePercent - did you try increasing that for 
a workaround?). It does make young collections much more frequent than 
necessary otherwise.

Particularly if these humongous regions are eager-reclaimable. In these 
cases the humongous allocations would be "free", while with that policy 
they would cause a young gc.

The other issue, if these humongous allocations cause too many 
concurrent cycles could be managed by looking into canceling the 
concurrent marking if that concurrent start gc freed lots and lots of 
humongous objects, e.g. getting way below the mark threshold again.

I did not think this through though, of course at some point you do need 
to start the concurrent mark.

Some (or most) of that heap pressure might have been caused by the 
internal fragmentation, so allowing allocation into the tail ends would 
very likely decrease that pressure too.
This would likely be the first thing I would be looking into if the logs 
indicate that.

> 2. Enhance the reclamation of short-live humongous object by
> covering object array that current eager reclaim only supports
> primitive type for now. This part looks same to JDK-8048180 and
> JDK-8073288 Thomas mentioned. The evacuation flow will iterate
> the humongous object array as a regular object if the humongous
> object is "young" which can be distinguished by the "age" field
> in markoop. >
> The patch is being tested. We will share it once it proves to
> work fine with our applications. I don't know if any similar
> approach has been already tried and any advices?

The problem with treating humongous reference arrays as young is that 
this heuristic significantly increases the garbage collection time if 
that object survives the collection.
I.e. the collector needs to iterate over all young objects, and while 
you do save the time to copy the object by in-place aging, scanning the 
references tends to take more time than copying.

In that "different regional collector" I referenced in the other email 
exactly this had been implemented with the above issues. That collector 
also had configurable regions down to 64k (well, basically even less, 
but anything below that was just for experimentation, and 64k had been 
very debatable too), so the humongous object problem had been a lot 
larger. It might not be the case with G1's "giant" humongous objects.

Treating them as old like they are now within G1 allows you to be a lot 
more selective about what you take in for garbage collection. Now the 
policy isn't particularly smart (just take humongous objects of a 
particular type with less than a low, fixed threshold of remembered set 
entries), but that could be improved.

I.e. G1 has a measure of how long scanning a remembered set entry 
approximately takes, so that could be made dependent on available time.

Thanks,
   Thomas