GC overhead limit exceeded

Sun Jan 12 17:45:35 PST 2014

Hi,

The heap dump contains not much info. When I tried to open with 'jhat' 
tool, I see only basic JDK core classes (Class, ClassLoader etc.) and 
nothing else. jmap with -F flag uses hotspot serviceability agent to 
dump the heap. i.e., It is read from outside the process. Such a dump is 
done without VM cooperation (VM bringing itself to safepoint to dump) 
and so likely to have very incomplete info at times.

Starting the VM with -XX:+HeapDumpOnOutOfMemoryError is likely to 
produce better dump.

-Sundar

On Saturday 11 January 2014 08:58 PM, Tal Liron wrote:
> Well, it happened again: I did only catch it a few hours after, when 
> CPU levels went back to normal.
>
> Couldn't connect with VisualVM (my fault, I did not enable remote JMX 
> support when I started it -- you know, the thing with the RMI port 
> usually being random).
>
> But, here's the dump I got from jmap, the gc.log, as well as other 
> dumps going to stdout:
>
> https://dl.dropboxusercontent.com/u/122806/jvm8_gc.zip
>
> I'll avoid restarting that JVM for now, in case you guys have other 
> things you want me to check.
>
> By the way, jmap did not run without problems. I had to use -F, and I 
> also got this exception:
>
> jmap -dump:live,format=b,file=jmap.dump -F 26920
> Attaching to process ID 26920, please wait...
> Debugger attached successfully.
> Server compiler detected.
> JVM version is 25.0-b63
> Dumping heap to jmap.dump ...
> Exception in thread "main" java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:483)
>         at sun.tools.jmap.JMap.runTool(JMap.java:201)
>         at sun.tools.jmap.JMap.main(JMap.java:130)
> Caused by: sun.jvm.hotspot.debugger.UnmappedAddressException: 328
>         at 
> sun.jvm.hotspot.debugger.PageCache.checkPage(PageCache.java:208)
>         at sun.jvm.hotspot.debugger.PageCache.getData(PageCache.java:63)
>         at 
> sun.jvm.hotspot.debugger.DebuggerBase.readBytes(DebuggerBase.java:225)
>         at 
> sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readCInteger(LinuxDebuggerLocal.java:498)
>         at 
> sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:462)
>         at 
> sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readAddress(LinuxDebuggerLocal.java:433)
>         at 
> sun.jvm.hotspot.debugger.linux.LinuxAddress.getAddressAt(LinuxAddress.java:74)
>         at 
> sun.jvm.hotspot.types.basic.BasicTypeDataBase.findDynamicTypeForAddress(BasicTypeDataBase.java:296)
>         at 
> sun.jvm.hotspot.runtime.VirtualBaseConstructor.instantiateWrapperFor(VirtualBaseConstructor.java:102)
>         at 
> sun.jvm.hotspot.oops.Metadata.instantiateWrapperFor(Metadata.java:68)
>         at sun.jvm.hotspot.oops.Oop.getKlassForOopHandle(Oop.java:211)
>         at sun.jvm.hotspot.oops.ObjectHeap.newOop(ObjectHeap.java:251)
>         at 
> sun.jvm.hotspot.classfile.ClassLoaderData.getClassLoader(ClassLoaderData.java:64)
>         at 
> sun.jvm.hotspot.memory.DictionaryEntry.loader(DictionaryEntry.java:63)
>         at 
> sun.jvm.hotspot.memory.Dictionary.classesDo(Dictionary.java:67)
>         at 
> sun.jvm.hotspot.memory.SystemDictionary.classesDo(SystemDictionary.java:190)
>         at 
> sun.jvm.hotspot.memory.SystemDictionary.allClassesDo(SystemDictionary.java:183)
>         at 
> sun.jvm.hotspot.utilities.HeapHprofBinWriter.writeClassDumpRecords(HeapHprofBinWriter.java:520)
>         at 
> sun.jvm.hotspot.utilities.HeapHprofBinWriter.write(HeapHprofBinWriter.java:430)
>         at sun.jvm.hotspot.tools.HeapDumper.run(HeapDumper.java:62)
>         at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
>         at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
>         at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
>         at sun.jvm.hotspot.tools.HeapDumper.main(HeapDumper.java:83)
>         ... 6 more
>
> On 01/10/2014 03:36 AM, Vladimir Ivanov wrote:
>> Heap dumps enables post-mortem analysis of OOMs.
>>
>> Pass -XX:+HeapDumpOnOutOfMemoryError to VM and it'll dump the heap 
>> before exiting or use jmap (-dump:live,format=b,file=<name> <pid>) or 
>> visualvm to take a snapshot from a running process.
>>
>> There are a number of tools to browse the contents.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 1/9/14 11:01 PM, Tal Liron wrote:
>>> Indeed, scripts are reused in this case, though I can't guarantee that
>>> there isn't a bug somewhere on my end.
>>>
>>> I'm wondering if it might be triggered by another issue: Prudence
>>> supports an internal crontab-life feature (based on cron4j), and these
>>> are again Nashorn scripts being run, once a minute. You can see them in
>>> the log of the example application you downloaded. Then again, the same
>>> exact feature is leveraged with Rhino.
>>>
>>> Another idea by which I may help: when the errors occur again on my
>>> server, I will be happy to provide you with SSH access to it to snoop
>>> around. We can also run VisualVM via an SSH tunnel, it should be 
>>> able to
>>> show us exactly which classes are not being GCed. If you think this
>>> would be helpful, please email my directly and we can set this up.
>>> However, in attempts to debug this locally the heap seems to be 
>>> behaving
>>> well enough.
>>>
>>> On 01/10/2014 01:38 AM, Hannes Wallnoefer wrote:
>>>> Tal,
>>>>
>>>> I've been thowing requests at the Prudence test app for the last 20
>>>> minutes or so. I do see that it uses a lot of metaspace, close to 50M
>>>> in my case. The test app seems to load/unload 2 classes per request
>>>> with Rhino compared to 4 classes per request with Nashorn, which is
>>>> probably due to differences in bytecode generation between the two
>>>> engines.
>>>>
>>>> I don't yet see metaspace usage growing beyond that limit, or
>>>> generating GC warnings. Maybe I haven't been running it long enough.
>>>>
>>>> I'm wondering if maybe metaspace is tight from the very beginning, and
>>>> the GC problems are caused by spikes in load (e.g. concurrent 
>>>> requests)?
>>>>
>>>> Also, are you aware of new classes being generated for each request?
>>>> Are you evaluating script files for each request? It would be more
>>>> efficient to evaluate the script just once and then reuse it for
>>>> subsequent requests.
>>>>
>>>> Hannes
>>>>
>>>> Am 2014-01-09 17:21, schrieb Tal Liron:
>>>>> You may download the latest release of Prudence, run it and bombard
>>>>> it with hits (use ab or a similar tool):
>>>>>
>>>>> http://threecrickets.com/prudence/download/
>>>>>
>>>>> To get the GC logs, start it like so:
>>>>>
>>>>> JVM_SWITCHES=\
>>>>>     -Xloggc:/full/path/to/logs/gc.log \
>>>>>     -XX:+PrintGCDetails \
>>>>>     -XX:+PrintTenuringDistribution \
>>>>>     sincerity start prudence
>>>>>
>>>>> To bombard it:
>>>>>
>>>>> ab -n 50000 -c 10 "http://localhost:8080/prudence-example/"
>>>>>
>>>>> Of course, you may also want to restrict the JVM heap size so it will
>>>>> happen sooner. I think. I actually don't understand JVM 8 GC at all,
>>>>> but you guys do, so have a go. All I can tell you is that I have a
>>>>> server running live on the Internet, which I have to restart every 3
>>>>> days due to this issue.
>>>>>
>>>>> Unfortunately, I don't have an easy way to isolate the problem to
>>>>> something smaller. However, I would think there's probably an
>>>>> advantage in using something as big as possible -- you can probably
>>>>> get very rich dumps of what is polluting the heap.
>>>>>
>>>>>
>>>>> On 01/10/2014 12:00 AM, Marcus Lagergren wrote:
>>>>>> Tal - The GC people 10 meters behind me want to know if you have a
>>>>>> repro of your full GC to death problem that they can look at?
>>>>>> They’re interested.
>>>>>>
>>>>>> /M
>>>>>>
>>>>>> On 09 Jan 2014, at 16:29, Kirk Pepperdine <kirk at kodewerk.com> wrote:
>>>>>>
>>>>>>> Hi Marcus,
>>>>>>>
>>>>>>> Looks like some of the details have been chopped off. Is there a GC
>>>>>>> log available? If there is a problem with MethodHandle a work
>>>>>>> around might be a simple as expanding perm.. but wait, this is meta
>>>>>>> space now and it should grow as long as your system has memory to
>>>>>>> give to the process. The only thing I can suggest is that the space
>>>>>>> to hold compressed class pointers is a fixed size and that if
>>>>>>> Nashorn is loading a lot of classes is that you consider making
>>>>>>> that space larger. Full disclosure, this isn’t something that I’ve
>>>>>>> had a chance to dabble with but I think there is a flag to control
>>>>>>> the size of that space. Maybe Colleen can offer better insight.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Kirk
>>>>>>>
>>>>>>> On Jan 9, 2014, at 10:02 AM, Marcus Lagergren
>>>>>>> <marcus.lagergren at oracle.com> wrote:
>>>>>>>
>>>>>>>> This almost certainly stems from the implementation from
>>>>>>>> MethodHandle combinators being implemented as lambda forms as
>>>>>>>> anonymous java classes. One of the things that is being done for
>>>>>>>> 8u20 is to drastically reduce the number of lambda forms created.
>>>>>>>> I don’t know of any workaround at the moment. CC:ing
>>>>>>>> hotspot-compiler-dev, so the people there can elaborate a bit.
>>>>>>>>
>>>>>>>> /M
>>>>>>>>
>>>>>>>> On 06 Jan 2014, at 06:57, Benjamin Sieffert
>>>>>>>> <benjamin.sieffert at metrigo.de> wrote:
>>>>>>>>
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> we have been observing similar symptoms from 7u40 onwards (using
>>>>>>>>> nashorn-backport with j7 -- j8 has the same problems as 7u40 and
>>>>>>>>> 7u45...
>>>>>>>>> 7u25 is the last version that works fine) and suspect the cause
>>>>>>>>> to be the
>>>>>>>>> JSR-292 changes that took place there. Iirc I already asked over
>>>>>>>>> on their
>>>>>>>>> mailing list. Here's the link:
>>>>>>>>> http://mail.openjdk.java.net/pipermail/mlvm-dev/2013-December/005586.html 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The fault might as well lie with nashorn, though. It's certainly
>>>>>>>>> worth
>>>>>>>>> investigating.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014/1/4 Tal Liron <tal.liron at threecrickets.com>
>>>>>>>>>
>>>>>>>>>> Thanks! I didn't know of these. I'm not sure how to read the
>>>>>>>>>> log, but this
>>>>>>>>>> doesn't look so good. I get a lot of "allocation failures" that
>>>>>>>>>> look like
>>>>>>>>>> this:
>>>>>>>>>>
>>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM (25.0-b63) for linux-amd64 JRE
>>>>>>>>>> (1.8.0-ea-b121), built on Dec 19 2013 17:29:18 by "java_re" with
>>>>>>>>>> gcc 4.3.0
>>>>>>>>>> 20080428 (Red Hat 4.3.0-8)
>>>>>>>>>> Memory: 4k page, physical 2039276k(849688k free), swap
>>>>>>>>>> 262140k(256280k
>>>>>>>>>> free)
>>>>>>>>>> CommandLine flags: -XX:InitialHeapSize=32628416
>>>>>>>>>> -XX:MaxHeapSize=522054656
>>>>>>>>>> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>>>>>>>>> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers
>>>>>>>>>> -XX:+UseCompressedOops -XX:+UseParallelGC
>>>>>>>>>> 0.108: [GC (Allocation Failure)
>>>>>>>>>> Desired survivor size 524288 bytes, new threshold 7 (max 15)
>>>>>>>>>> [PSYoungGen: 512K->496K(1024K)] 512K->496K(32256K), 0.0013194 
>>>>>>>>>> secs]
>>>>>>>>>> [Times: user=0.01 sys=0.00, real=0.00 secs]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 01/04/2014 10:02 PM, Ben Evans wrote:
>>>>>>>>>>
>>>>>>>>>>> -Xloggc:<pathtofile> -XX:+PrintGCDetails
>>>>>>>>>>> -XX:+PrintTenuringDistribution
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Benjamin Sieffert
>>>>>>>>> metrigo GmbH
>>>>>>>>> Sternstr. 106
>>>>>>>>> 20357 Hamburg
>>>>>>>>>
>>>>>>>>> Geschäftsführer: Christian Müller, Tobias Schlottke, Philipp
>>>>>>>>> Westermeyer
>>>>>>>>> Die Gesellschaft ist eingetragen beim Registergericht Hamburg
>>>>>>>>> Nr. HRB 120447.
>>>>>
>>>>
>>>
>