RFR (S) 8240556 and RFR (S) 8248783 for G1 humongous objects
Thomas Schatzl
thomas.schatzl at oracle.com
Fri Aug 7 09:08:00 UTC 2020
Hi Liang,
first, apologies for the late answer; jdk15 release came inbetween,
then vacation time...
On 03.07.20 10:35, Liang Mao wrote:
> Hi Man and G1 team,
>
> Previously we were discussing about abort of concurrent
> mark(JDK-8240556). I
> made a concurrent cleaning bitmap version instead of the original STW one.
> But there are still some other problems in real world. Recently I revisited
> the implementation and make more optimization and the result seems very
> good.
> Bug: https://bugs.openjdk.java.net/browse/JDK-8240556
> Webrev:
>
<http://cr.openjdk.java.net/~ddong/liangmao/8240556/>http://cr.openjdk.java.net/~ddong/liangmao/8240556/webrev.00/
> (This may related to https://bugs.openjdk.java.net/browse/JDK-8247928)
I looked at it a bit, but need to go over my notes again.
Potentially we might want to first refactor the concurrent mark setup
code in G1CollectedHeap a bit to simplify it. Basically moving it
together a bit, and clearing out a few assumptions in some of the code
that mean that concurrent cycle == full marking.
I think from a functionality pov it is good (I have a recollection that
there may be one or two things missing, but I may be wrong as it
completely reuses the concurrent mark cycle).
More in a different thread.
> The most severe problems in our work loads are as below:
> 1) Humongous objects allocation will invade the space of young generation
> and lead to long time STW because of to-space exhausted and even Full GC
> 2) Frequent concurrent marking will cost significant CPU resources (which
> could be resolved by abort concurrent mark JDK-8240556
> <https://bugs.openjdk.java.net/browse/JDK-8240556>)
> 3) Frequent GCs because the humongous allocation can reach IHOP very quickly
> even no matter w/or/w/oabort concurrent mark.
>
> Both 1) and 3) didn't have solutions yet. I made another change to share
> the young space for humongous allocation and eden allocation. Initial-mark
> will be triggered if reserve space is invaded. My colleague helped to create a
> new bug ID: https://bugs.openjdk.java.net/browse/JDK-8248783
> With this change and abort concurrent mark our problematic work loads run
> very smoothly without any GC issues. I have tested the 2 changes in several
> different applications, like web services, database, etc.
>
> Our real test is done with 8u and I have created JDK15 webrevs with the same
> logic. The code completes the test of jtreg gc/g1 and jbb2015. Could you
> please take a look and run some tests?
>
I spent a lot of time on that patch and the test (which is brokne as is,
see CR), running it on multiple machines:
Let me recap the results and findings:
- all collectors (including CMS and others) on all jdk versions run into
full gcs into that test. This is an indication to me that the
application is a very edgy-edge case.
- minor re-turning of options (like the suggested heap region sizing, or
as suggested in the CR, increasing concurrent gc threads) allow all
collectors I tried to pass.
- as far as I understand the problem, the main cause of the full gcs
(for G1 at least, where I looked in detail) is the concurrent gcs not
being able to keep up with reclamation because:
a) these humongous objects taking more than expected space due to
internal fragmentation
b) evacuation failure in G1, for every GC, using up hundreds of
additional regions to keep the regions alive, filling up the heap quickly.
The change basically increases the amount of young garbage collections
by (artificially) reducing eden size to add intermediate reclamations of
short-living humongous objects.
(You can see this in e.g. gc+heap logs in lines like
[95.370s][info][gc,heap ] GC(5908) Eden regions: 191->0(545)
I.e. a projected 545 eden regions were cut short at 191 regions)
Unfortunately, the current heuristic is not good enough, you still get
full gcs, obviously depending on your environment (I ran it on three
different machines, a highly threaded workstation (Ivy Bridge E), a
desktop (Ryzen 3xxx), and a laptop (Kaby Lake) without significant
differences). Let me show you some numbers on jdk/jdk:
(use fixed width font to see the table)
variant / #evacuation / #full gcs [1] / time taken [s]
failures [1]
baseline 9796 653 207
+8240556 1140 114 113
+8248783 0 96 95
(patches are inclusive, i.e. the last change also included 8240556)
I.e. you still get full gcs. Taking a step back what this does, the
heuristic to do a young gc when a humongous object reaches into the heap
reserve is very arbitrary, and it shows. It may work for you in your
environment for your application, but it is not generally applicable at all.
Now you could make the time when this extra gc happens configurable, but
then you already need user intervention, and the CR already gives at
least one option to avoid this issue. There are already more btw, like
manually limiting eden size. (-XX:MaxNewSize(Percent) et al).
This is not the only problem I see with this heuristic: it does not fix
the actual problems I stated above, all which are fixable in some way or
another:
1) fix the internal fragmentation; listing some of the options already
written about in the bug tracker:
- "8172713: Allow allocation into the tail end of humongous objects"
- "8229373: Reserve more than MaxHeapSize for the heap to fight
humongous object fragmentation in G1" in conjunction with "8251263: G1:
Only commit actually used space for humongous objects"
2) improve evacuation failure
- "8140326: Consider putting regions where evacuation failed into
collection set"; that would reduce the overhead of evacuation failure
from #failed-gcs * avg-failing-regions to just avg-failing-regions
(discounting regular object survival)
3) make young gen sizing aware of extreme short-living humongous object
allocation; "8245511: G1 adaptive IHOP does not account for reclamation
of humongous objects by young GC goes into that direction (for adaptive
IHOP), but I'll file a new CR for young gen sizing.
4) this new heuristic, short-cutting regular young gen sizing may also
negatively impact achieving pause time goals like "8072920: G1's policy
to interrupt regular operation with an initial mark gc after humongous
object allocation may cause missed pause time goals" already describes
for applications that do not have these problems.
Taking all this into account my current position is to not add this to
jdk/jdk. Rather I would like to ask you to look into one of these issues
fixing the causes mentioned above instead of adding a new heuristic.
Note that I am planning to invest time in looking into internal
fragmentation in the next time (i.e. 1) above), so in some future
release this might already not be an issue any more.
Looking over the suggestions here, I think the option 3) would be the
closest fix (the more I'm thinking about this, the more I think this is
exactly the fix) to what you were trying to accomplish.
*If* you go down that route *please* first talk to me as the recent
young gen sizing changes still need to be pushed (I was and still am
looking into some crashes that were uncovered in unrelated components
because of this, which are also the cause for this delay, e.g.
JDK-8249192) and you probably do not want to base your work on bugged
behavior.
The situation may be different for older releases where the situation is
different (like jdk8 not having fast evacuation failures and "fast" full
gcs), but it's not me who decides to take this in. Note that some the
same reservations apply. There are also new arguments to not take this
in too given other goals of these projects.
> Bug: https://bugs.openjdk.java.net/browse/JDK-8248783
> Webrev: http://cr.openjdk.java.net/~ddong/liangmao/8248783/webrev.00/
>
Thanks,
Thomas
More information about the hotspot-gc-dev
mailing list