RFR (S) 8240556 and RFR (S) 8248783 for G1 humongous objects

Fri Aug 7 09:08:00 UTC 2020

Hi Liang,

   first, apologies for the late answer; jdk15 release came inbetween, 
then vacation time...

On 03.07.20 10:35, Liang Mao wrote:
> Hi Man and G1 team,
> 
> Previously we were discussing about abort of concurrent 
> mark(JDK-8240556). I
> made a concurrent cleaning bitmap version instead of the original STW one.
> But there are still some other problems in real world. Recently I revisited
> the implementation and make more optimization and the result seems very 
> good.
> Bug: https://bugs.openjdk.java.net/browse/JDK-8240556
> Webrev:
> 
<http://cr.openjdk.java.net/~ddong/liangmao/8240556/>http://cr.openjdk.java.net/~ddong/liangmao/8240556/webrev.00/
> (This may related to https://bugs.openjdk.java.net/browse/JDK-8247928)
I looked at it a bit, but need to go over my notes again.

Potentially we might want to first refactor the concurrent mark setup 
code in G1CollectedHeap a bit to simplify it. Basically moving it 
together a bit, and clearing out a few assumptions in some of the code 
that mean that concurrent cycle == full marking.

I think from a functionality pov it is good (I have a recollection that 
there may be one or two things missing, but I may be wrong as it 
completely reuses the concurrent mark cycle).

More in a different thread.

> The most severe problems in our work loads are as below:
> 1) Humongous objects allocation will invade the space of young generation
> and lead to long time STW because of to-space exhausted and even Full GC
> 2) Frequent concurrent marking will cost significant CPU resources (which
> could be resolved by abort concurrent mark JDK-8240556 
> <https://bugs.openjdk.java.net/browse/JDK-8240556>)
> 3) Frequent GCs because the humongous allocation can reach IHOP very quickly
> even no matter w/or/w/oabort concurrent mark.
> 
> Both 1) and 3) didn't have solutions yet. I made another change to share 
> the young space for humongous allocation and eden allocation. Initial-mark 
> will be triggered if reserve space is invaded. My colleague helped to create a 
> new bug ID: https://bugs.openjdk.java.net/browse/JDK-8248783
> With this change and abort concurrent mark our problematic work loads run
> very smoothly without any GC issues. I have tested the 2 changes in several
> different applications, like web services, database, etc.
> 
> Our real test is done with 8u and I have created JDK15 webrevs with the same
> logic. The code completes the test of jtreg gc/g1 and jbb2015. Could you 
> please take a look and run some tests?
> 

I spent a lot of time on that patch and the test (which is brokne as is, 
see CR), running it on multiple machines:

Let me recap the results and findings:

- all collectors (including CMS and others) on all jdk versions run into 
full gcs into that test. This is an indication to me that the 
application is a very edgy-edge case.

- minor re-turning of options (like the suggested heap region sizing, or 
as suggested in the CR, increasing concurrent gc threads) allow all 
collectors I tried to pass.

- as far as I understand the problem, the main cause of the full gcs 
(for G1 at least, where I looked in detail) is the concurrent gcs not 
being able to keep up with reclamation because:

   a) these humongous objects taking more than expected space due to 
internal fragmentation
   b) evacuation failure in G1, for every GC, using up hundreds of 
additional regions to keep the regions alive, filling up the heap quickly.

The change basically increases the amount of young garbage collections 
by (artificially) reducing eden size to add intermediate reclamations of 
short-living humongous objects.
(You can see this in e.g. gc+heap logs in lines like

[95.370s][info][gc,heap        ] GC(5908) Eden regions: 191->0(545)

I.e. a projected 545 eden regions were cut short at 191 regions)

Unfortunately, the current heuristic is not good enough, you still get 
full gcs, obviously depending on your environment (I ran it on three 
different machines, a highly threaded workstation (Ivy Bridge E), a 
desktop (Ryzen 3xxx), and a laptop (Kaby Lake) without significant 
differences). Let me show you some numbers on jdk/jdk:

(use fixed width font to see the table)

variant    / #evacuation  /  #full gcs [1] / time taken [s]
              failures [1]
baseline        9796             653             207
+8240556        1140             114             113
+8248783          0               96              95

(patches are inclusive, i.e. the last change also included 8240556)

I.e. you still get full gcs. Taking a step back what this does, the 
heuristic to do a young gc when a humongous object reaches into the heap 
reserve is very arbitrary, and it shows. It may work for you in your 
environment for your application, but it is not generally applicable at all.

Now you could make the time when this extra gc happens configurable, but 
then you already need user intervention, and the CR already gives at 
least one option to avoid this issue. There are already more btw, like 
manually limiting eden size. (-XX:MaxNewSize(Percent) et al).

This is not the only problem I see with this heuristic: it does not fix 
the actual problems I stated above, all which are fixable in some way or 
another:

1) fix the internal fragmentation; listing some of the options already 
written about in the bug tracker:
   - "8172713: Allow allocation into the tail end of humongous objects"
   - "8229373: Reserve more than MaxHeapSize for the heap to fight 
humongous object fragmentation in G1" in conjunction with "8251263: G1: 
Only commit actually used space for humongous objects"

2) improve evacuation failure
   - "8140326: Consider putting regions where evacuation failed into 
collection set"; that would reduce the overhead of evacuation failure 
from #failed-gcs * avg-failing-regions to just avg-failing-regions 
(discounting regular object survival)

3) make young gen sizing aware of extreme short-living humongous object 
allocation; "8245511: G1 adaptive IHOP does not account for reclamation 
of humongous objects by young GC goes into that direction (for adaptive 
IHOP), but I'll file a new CR for young gen sizing.

4) this new heuristic, short-cutting regular young gen sizing may also 
negatively impact achieving pause time goals like "8072920: G1's policy 
to interrupt regular operation with an initial mark gc after humongous 
object allocation may cause missed pause time goals" already describes 
for applications that do not have these problems.

Taking all this into account my current position is to not add this to 
jdk/jdk. Rather I would like to ask you to look into one of these issues 
fixing the causes mentioned above instead of adding a new heuristic.

Note that I am planning to invest time in looking into internal 
fragmentation in the next time (i.e. 1) above), so in some future 
release this might already not be an issue any more.

Looking over the suggestions here, I think the option 3) would be the 
closest fix (the more I'm thinking about this, the more I think this is 
exactly the fix) to what you were trying to accomplish.
*If* you go down that route *please* first talk to me as the recent 
young gen sizing changes still need to be pushed (I was and still am 
looking into some crashes that were uncovered in unrelated components 
because of this, which are also the cause for this delay, e.g. 
JDK-8249192) and you probably do not want to base your work on bugged 
behavior.

The situation may be different for older releases where the situation is 
different (like jdk8 not having fast evacuation failures and "fast" full 
gcs), but it's not me who decides to take this in. Note that some the 
same reservations apply. There are also new arguments to not take this 
in too given other goals of these projects.

> Bug: https://bugs.openjdk.java.net/browse/JDK-8248783
> Webrev: http://cr.openjdk.java.net/~ddong/liangmao/8248783/webrev.00/
> 

Thanks,
   Thomas