OOM error caused by large array allocation in G1

Tue Nov 21 13:48:44 UTC 2017

Hi Thomas,

Sorry for the late reply and thanks for your nice interpretation.
I added some comments and questions inline.

Thanks,

Lijie

On Sat, Nov 18, 2017 at 11:37 PM, Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

> Hi,
>
> On Sat, 2017-11-18 at 20:53 +0800, Lijie Xu wrote:
> > Hi All,
> > I recently encountered an OOM error in a Spark application using G1
> > collector. This application launches multiple JVM instances to
> > process the large data. Each JVM has 6.5GB heap size and uses G1
> > collector. A JVM instance throws an OOM error during allocating a
> > large (570MB) array. However, this JVM has about 3GB free heap space
> > at that time. After analyzing the application logic, heap usage, and
> > GC log, I guess the root cause may be the lack of consecutive space
> > for holding this large array in G1. I want to know whether my guess
> > is right ...
>
> Very likely. This is a long-standing issue (actually I have once
> investigated about it like 10 years ago on a different regional
> collector), and given your findings it is very likely you are correct.
> The issue also has an extra section in the tuning guide [0].
>

*==> This reference is very helpful for me. Another question is that "Do
Parallel and CMS collectors have this defect too"?*

>
> > ... and why G1 has this defect.
>
> Nobody fixed it yet. :)
>
> Reasons:
> - workaround easy and typically "just works".
> - no "real world" test setups where fixes could be tested available.
> People tend to disappear after getting to know the workaround.
> Unfortunately, Apache SPARK which is probably one of the more frequent
> environmnet it happens with, but it still does not work on jdk9/10 and
> soon 11 yet where development happens.
> - it's not very interesting work for many. Not sure why, probably
> because it involves implementing and evaluating longer term strategies
> in the collector to minimize impact of fragmentation which is a complex
> topic (at least if you are not satisfied with the last-ditch brute
> force approach).
> - there are more problematic issues to deal with that affect more
> installations, have test setups, and no or no good workaround.
>
> Actually I have been discussing this with colleagues just last week
> again in context of work for students/interns. :)
>
> If you want to look into this there are a bunch of CRs open that you
> might want to start with (e.g. [1][2][3]) to get an idea of
> possibilities - these CRs do not even mention the one brute force
> solution other VMs probably apply in that situation: have the full gc
> move large arrays too.
>
> Feel free to start a discussion about this topic either here or
> preferably in the hotspot-gc-dev mailing list.
>
> > In the following sections, I will detail the JVM info, application,
> > OOM phase, and heap usage. Any suggestions will be appreciated.
>
> Simply either increase the heap size or increase region size via
> -XX:HeapRegionSize. I think 16m regions will fix the issue in your case
> without any other performance impact, and reduce the amount of
> humongous objects significantly.
>

*==> Your guess is quite right. I have changed the region size to 8m, 16m,
and 32m.*
*The application still throws an OOM error in 8m, but successfully finished
in 16m and 32m.*

> > [JVM info]
> > java version "1.8.0_121"
> > Oracle Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
> > Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
>
> While it won't impact this issue, I recommend updating at least to the
> latest 8u release. Not suggesting jdk 9 here because we know that SPARK
> does not work there yet.
>
> Thanks,
>   Thomas
>
> [0] https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-col
> lector-tuning.htm#GUID-2428DA90-B93D-48E6-B336-A849ADF1C552
> <https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector-tuning.htm#GUID-2428DA90-B93D-48E6-B336-A849ADF1C552>
> [1] https://bugs.openjdk.java.net/browse/JDK-8172713
> [2] https://bugs.openjdk.java.net/browse/JDK-8038487
> [3] https://bugs.openjdk.java.net/browse/JDK-8173627
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20171121/a2f02625/attachment.html>