HDFS Namenode with large heap size

Fengnan Li lfengnan at uber.com
Sat Feb 9 09:02:33 UTC 2019


I didn’t use `-XX:+ExplicitGCInvokesConcurrent`. If by default it is not turned on then it is not used in our GC.
Indeed I seldomly see full GC, but I wouldn’t call it running well due to the large memory overhead. (Not sure whether it is tunable at this point)

I tried two `jmap -histo:live` and triggered to full GC, some of the gc logs are:

[Full GC (Heap Inspection Initiated GC)  151G->143G(300G), 756.5877774 secs]
   [Eden: 8800.0M(119.7G)->0.0B(119.7G) Survivors: 0.0B->0.0B Heap: 151.7G(300.0G)->143.1G(300.0G)], [Metaspace: 44549K->44549K(45056K)]
 [Times: user=2895.66 sys=2.13, real=756.58 secs]

During the period I didn’t see the overall memory consumption changed… And it is pretty horrible that one GC takes almost 13 minutes, which is not ok in prod.

I tried to search a lot about analyzing JVM non heap memory but couldn’t find any…

Let me know how I can figure out the Internal size.

Thanks,
Fengnan

> On Feb 9, 2019, at 12:17 AM, Krystal Mok <rednaxelafx at gmail.com> wrote:
> 
> Interesting. Was G1 working so well for your application that it hardly ever performed Full GCs? Mostly just Young GCs and Mixed GCs? You probably also turned on "-XX:+ExplicitGCInvokesConcurrent", right?
> 
> The "Internal" component can be from a lot of things, but the most significant source would be DirectByteBuffers as you pointed out.
> 
> The reason why I asked about (the lack of) full GCs is because I was curious whether or not you're seeing a lot of garbage DBBs in the Java heap that haven't been processed by G1 in the mixed GCs, potentially because they're in regions that didn't make it into the CSet of mixed GCs.
> 
> If you have a running G1 version of that Java application (is it just NameNode?), could you please try to get it into the state when it consumes 450GB+ memory, and then do two consecutive "jmap -histo:live" on that process to force two full GCs and see what happens? Yes it'll pause for a while but if your environment is okay with experiments like this, it'd be helpful.
> 
> - Kris
> 
> On Fri, Feb 8, 2019 at 4:37 PM Fengnan Li <lfengnan at uber.com <mailto:lfengnan at uber.com>> wrote:
> Hi Kris,
> 
> Thanks very much for the response!
> 
> I didn’t expect the RSS of the Java process to be 300GB when I set the heap size to 300GB since from a lot of other use cases I know that G1 takes more memory overhead compared to CMS. 
> I got the 450G number from our internal metrics, which essentially reads the /proc/meminfo file for the memory footprint. On the machine there is no other process taking a lot of memory (more than 1% of total memory or single digit GB).
> 
> I turned on the NMT option, and print out the JVM memory stack:
> 
> Native Memory Tracking:
> 
> Total: reserved=477GB, committed=476GB
> -                 Java Heap (reserved=300GB, committed=300GB)
>                             (mmap: reserved=300GB, committed=300GB)
> 
> -                    Thread (reserved=1GB, committed=1GB)
>                             (thread #723)
>                             (stack: reserved=1GB, committed=1GB)
> 
> -                        GC (reserved=23GB, committed=23GB)
>                             (malloc=12GB #20497380)
>                             (mmap: reserved=11GB, committed=11GB)
> 
> -                  Internal (reserved=152GB, committed=152GB)
>                             (malloc=152GB #19364496)
> 
> -    Native Memory Tracking (reserved=1GB, committed=1GB)
>                             (tracking overhead=1GB)
> 
> -                   Unknown (reserved=1GB, committed=0GB)
>                             (mmap: reserved=1GB, committed=0GB)
> 
> Interal (direct byte buffer) takes a lot of the space. GC overhead looks OK in this case.
> 
> This is pretty weird, I run the same app with CMS on the machine, but there is no Internal part from the stack. Do you know why this happens?
> 
> Thanks,
> Fengnan
> 
>> On Feb 8, 2019, at 2:07 PM, Krystal Mok <rednaxelafx at gmail.com <mailto:rednaxelafx at gmail.com>> wrote:
>> 
>> Hi Fengnan,
>> 
>> This is Kris Mok currently working at Databricks. I used to work on HotSpot and Zing JVMs.
>> Just curious how you got to the conclusion of G1 taking 450GB memory. Did you start the VM with -Xmx300g expecting that the RSS of that Java process be close to 300GB? If that's the case, that's a unreasonable expectation to begin with.
>> 
>> It's a very common case for G1 itself to take a high memory overhead due to its design of the Remembered Set (RSet), and that can be tuned to use less memory by making it more coarse-grained, with the tradeoff that root scanning pause can take longer because more portion of the heap is going to have to be scanned.
>> 
>> But I doubt that what you're actually seeing. To confirm where the memory went within the HotSpot JVM, please turn on NMT (Native Memory Tracking) and see how much memory each component within the JVM is using.
>> 
>> https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html <https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html>
>> https://docs.oracle.com/javase/8/docs/technotes/guides/vm/nmt-8.html <https://docs.oracle.com/javase/8/docs/technotes/guides/vm/nmt-8.html>
>> 
>> - Kris
>> 
>> On Fri, Feb 8, 2019 at 1:51 PM Fengnan Li <lfengnan at uber.com <mailto:lfengnan at uber.com>> wrote:
>> Hi All,
>> 
>> We are trying to use G1 for our HDFS Namenode to see whether it will deliver better GC overall than currently used CMS. However, with the 200G heap size JVM option, the G1 wouldn’t even start our namenode with the production image and will be killed out of memory after running for 1 hours (loading initial data). For the same heap size, CMS can work properly with around 98% throughput and averagely 120ms pause.
>> 
>> We use pretty much the basic options, and tried to tune a little but not much progress. Is there a way to lower down the overall memory footprint for G1?
>> 
>> We managed to start the application with 300G heap size option, but overall G1 will consume about 450G memory, which is problematic.
>> 
>> Thanks,
>> Fengnan_______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>> https://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use <https://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20190209/d9291454/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2710 bytes
Desc: not available
URL: <https://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20190209/d9291454/smime-0001.p7s>


More information about the hotspot-gc-use mailing list