[G1GC] Evacuation failures with bursts of humongous object allocations
Liang Mao
maoliang.ml at alibaba-inc.com
Thu Nov 12 06:26:39 UTC 2020
Hi Charlie,
You might know that we had the same issue as well from JDK-8248783 and previous
discussion:
https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-July/030295.html
So far we found that the most simple and efficient way to resolve this problem is
increasing G1HeapRegionSize which could cover majority of cases. That will make
those humongous objects normal ones allocated in eden space. Then the problem
is gone.
Another approach you know is JDK-8248783 which has the same action as your solution
to trigger a GC in advance before free regions exhaust but we have different heuristics
to trigger the GC.
Personally I don't prefer to tune G1ReservePercent because increasing G1ReservePercent
could significantly reduce the utility of heap size. I'm very interested in your discussion
to obsolete G1ReservePercent by precise survivor size predication/calculation if possible.
Thanks,
Liang
> -----Original Message-----
> From: hotspot-gc-dev [mailto:hotspot-gc-dev-retn at openjdk.java.net] On
> Behalf Of Charlie Gracie
> Sent: 2020年11月6日 6:50
> To: hotspot-gc-dev at openjdk.java.net
> Subject: [G1GC] Evacuation failures with bursts of humongous object allocations
>
> Hi,
>
> We have been investigating an issue with G1GC and bursts of short lived
> humongous object allocations. Normally during the application, the humongous
> object allocation rate is about 1 humongous region between each GC.
> Occasionally, the humongous allocation rate climbs to 600 or more regions
> between 2 GC cycles and consumes 100% of the free regions. The subsequent
> GC has no free regions for to-space and not even a single object can be
> evacuated. Since to-space is exhausted immediately, the GC is extremely long
> due to dealing with evacuation failures. The workload is running on JDK 11 but
> we have been able to reproduce it on JDK 16 builds.
> About 1/40 GCs are impacted by these bursts of humongous allocations.
>
> [3] is an example of a GC running on JDK 11 when the burst of humongous
> allocations happens. [4] is an example of the rest of the GCs.
>
> It seems like -XX:G1ReservePercent is the recommended way to tune for
> humongous object allocations. Is this correct? We could tune around this
> behaviour by increasing the G1ReserverPercent and heap size but since this
> happens rarely the JVM will be over provisioned most of the time. This is an ok
> work-around but I am hoping we can make G1GC more resilient to bursts of
> humongous object allocations.
>
> What we are experiencing seems related to JDK-8248783 [1] and I have been
> prototyping changes that may resolve one of their issues as well. My approach is
> to force a GC during the slow allocation path if the number of free regions is
> about to drop below a reasonable threshold to complete the next GC cycle. The
> check is inserted into the slow path for regular objects and humongous objects.
> In my current prototype [2] the G1 slow allocation path will only allow a free
> region to be consumed:
>
> if (((ERC / SR) + ((SRC * TSR) / 100)) <= (FRC - CR))
>
> ERC - eden region count
> SR - SurvivorRatio
> SRC - survivor region count
> TSR - TargetSurvivoRatio
> FRC - free region count
> CR - number of free regions required for allocation
>
> Using this algorithm significantly improves G1GCs handling of bursts of
> humongous object allocations. I have not measured any degradations to
> "normal" workloads we run but that may not be representative set. In theory,
> this should only impact workloads that consume more humongous regions than
> G1ReservePercent between GC cycles.
>
> I am curious about what other people think of the behaviour we are seeing and
> the solution I am experimenting with. Any feedback would be greatly
> appreciated.
>
> Thanks,
> Charlie
>
> [1] - https://bugs.openjdk.java.net/browse/JDK-8248783
> [2] - https://github.com/charliegracie/jdk/tree/humongous_regions
>
> [3] - Example of a bad GC during the burst humongous object allocations
> GC(468) Pause Young (Prepare Mixed) (G1 Humongous Allocation)
> GC(468) GC(468) Age table with threshold 15 (max threshold 15)
> GC(468) To-space exhausted
> GC(468) Pre Evacuate Collection Set: 0.2ms
> GC(468) Prepare TLABs: 0.2ms
> GC(468) Choose Collection Set: 0.0ms
> GC(468) Humongous Register: 0.2ms
> GC(468) Evacuate Collection Set: 30.1ms
> GC(468) Post Evacuate Collection Set: 253.3ms
> GC(468) Evacuation Failure: 249.1ms
> GC(468) Eden regions: 404->0(64)
> GC(468) Survivor regions: 8->0(69)
> GC(468) Old regions: 182->594
> GC(468) Humongous regions: 686->2
> GC(468) Pause Young (Prepare Mixed) (G1 Humongous Allocation) 10225M-
> >4755M(10240M) 285.057ms
>
> [4] Regular GC from the same log for comparison.
> GC(465) Pause Young (Normal) (G1 Evacuation Pause)
> GC(465) Age table with threshold 15 (max threshold 15)
> GC(465) - age 1: 21586848 bytes, 21586848 total
> GC(465) - age 2: 7962712 bytes, 29549560 total
> GC(465) - age 3: 1033216 bytes, 30582776 total
> GC(465) - age 4: 4710920 bytes, 35293696 total
> GC(465) - age 5: 716064 bytes, 36009760 total
> GC(465) - age 6: 2387064 bytes, 38396824 total
> GC(465) - age 7: 2331208 bytes, 40728032 total
> GC(465) - age 8: 321680 bytes, 41049712 total
> GC(465) - age 9: 4974056 bytes, 46023768 total
> GC(465) - age 10: 106488 bytes, 46130256 total
> GC(465) Pre Evacuate Collection Set: 0.0ms
> GC(465) Evacuate Collection Set: 16.0ms
> GC(465) Post Evacuate Collection Set: 1.2ms
> GC(465) Other: 1.3ms
> GC(465) Eden regions: 494->0(537)
> GC(465) Survivor regions: 5->7(63)
> GC(465) Old regions: 182->182
> GC(465) Humongous regions: 1->1
> GC(465) Pause Young (Normal) (G1 Evacuation Pause) 5454M->1512M(10240M)
> 18.704ms
>
More information about the hotspot-gc-dev
mailing list