GC overhead limit exceeded

Sat Jan 11 07:28:17 PST 2014

Well, it happened again: I did only catch it a few hours after, when CPU 
levels went back to normal.

Couldn't connect with VisualVM (my fault, I did not enable remote JMX 
support when I started it -- you know, the thing with the RMI port 
usually being random).

But, here's the dump I got from jmap, the gc.log, as well as other dumps 
going to stdout:

https://dl.dropboxusercontent.com/u/122806/jvm8_gc.zip

I'll avoid restarting that JVM for now, in case you guys have other 
things you want me to check.

By the way, jmap did not run without problems. I had to use -F, and I 
also got this exception:

jmap -dump:live,format=b,file=jmap.dump -F 26920
Attaching to process ID 26920, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.0-b63
Dumping heap to jmap.dump ...
Exception in thread "main" java.lang.reflect.InvocationTargetException
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:483)
         at sun.tools.jmap.JMap.runTool(JMap.java:201)
         at sun.tools.jmap.JMap.main(JMap.java:130)
Caused by: sun.jvm.hotspot.debugger.UnmappedAddressException: 328
         at sun.jvm.hotspot.debugger.PageCache.checkPage(PageCache.java:208)
         at sun.jvm.hotspot.debugger.PageCache.getData(PageCache.java:63)
         at 
sun.jvm.hotspot.debugger.DebuggerBase.readBytes(DebuggerBase.java:225)
         at 
sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readCInteger(LinuxDebuggerLocal.java:498)
         at 
sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:462)
         at 
sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readAddress(LinuxDebuggerLocal.java:433)
         at 
sun.jvm.hotspot.debugger.linux.LinuxAddress.getAddressAt(LinuxAddress.java:74)
         at 
sun.jvm.hotspot.types.basic.BasicTypeDataBase.findDynamicTypeForAddress(BasicTypeDataBase.java:296)
         at 
sun.jvm.hotspot.runtime.VirtualBaseConstructor.instantiateWrapperFor(VirtualBaseConstructor.java:102)
         at 
sun.jvm.hotspot.oops.Metadata.instantiateWrapperFor(Metadata.java:68)
         at sun.jvm.hotspot.oops.Oop.getKlassForOopHandle(Oop.java:211)
         at sun.jvm.hotspot.oops.ObjectHeap.newOop(ObjectHeap.java:251)
         at 
sun.jvm.hotspot.classfile.ClassLoaderData.getClassLoader(ClassLoaderData.java:64)
         at 
sun.jvm.hotspot.memory.DictionaryEntry.loader(DictionaryEntry.java:63)
         at sun.jvm.hotspot.memory.Dictionary.classesDo(Dictionary.java:67)
         at 
sun.jvm.hotspot.memory.SystemDictionary.classesDo(SystemDictionary.java:190)
         at 
sun.jvm.hotspot.memory.SystemDictionary.allClassesDo(SystemDictionary.java:183)
         at 
sun.jvm.hotspot.utilities.HeapHprofBinWriter.writeClassDumpRecords(HeapHprofBinWriter.java:520)
         at 
sun.jvm.hotspot.utilities.HeapHprofBinWriter.write(HeapHprofBinWriter.java:430)
         at sun.jvm.hotspot.tools.HeapDumper.run(HeapDumper.java:62)
         at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
         at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
         at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
         at sun.jvm.hotspot.tools.HeapDumper.main(HeapDumper.java:83)
         ... 6 more

On 01/10/2014 03:36 AM, Vladimir Ivanov wrote:
> Heap dumps enables post-mortem analysis of OOMs.
>
> Pass -XX:+HeapDumpOnOutOfMemoryError to VM and it'll dump the heap 
> before exiting or use jmap (-dump:live,format=b,file=<name> <pid>) or 
> visualvm to take a snapshot from a running process.
>
> There are a number of tools to browse the contents.
>
> Best regards,
> Vladimir Ivanov
>
> On 1/9/14 11:01 PM, Tal Liron wrote:
>> Indeed, scripts are reused in this case, though I can't guarantee that
>> there isn't a bug somewhere on my end.
>>
>> I'm wondering if it might be triggered by another issue: Prudence
>> supports an internal crontab-life feature (based on cron4j), and these
>> are again Nashorn scripts being run, once a minute. You can see them in
>> the log of the example application you downloaded. Then again, the same
>> exact feature is leveraged with Rhino.
>>
>> Another idea by which I may help: when the errors occur again on my
>> server, I will be happy to provide you with SSH access to it to snoop
>> around. We can also run VisualVM via an SSH tunnel, it should be able to
>> show us exactly which classes are not being GCed. If you think this
>> would be helpful, please email my directly and we can set this up.
>> However, in attempts to debug this locally the heap seems to be behaving
>> well enough.
>>
>> On 01/10/2014 01:38 AM, Hannes Wallnoefer wrote:
>>> Tal,
>>>
>>> I've been thowing requests at the Prudence test app for the last 20
>>> minutes or so. I do see that it uses a lot of metaspace, close to 50M
>>> in my case. The test app seems to load/unload 2 classes per request
>>> with Rhino compared to 4 classes per request with Nashorn, which is
>>> probably due to differences in bytecode generation between the two
>>> engines.
>>>
>>> I don't yet see metaspace usage growing beyond that limit, or
>>> generating GC warnings. Maybe I haven't been running it long enough.
>>>
>>> I'm wondering if maybe metaspace is tight from the very beginning, and
>>> the GC problems are caused by spikes in load (e.g. concurrent 
>>> requests)?
>>>
>>> Also, are you aware of new classes being generated for each request?
>>> Are you evaluating script files for each request? It would be more
>>> efficient to evaluate the script just once and then reuse it for
>>> subsequent requests.
>>>
>>> Hannes
>>>
>>> Am 2014-01-09 17:21, schrieb Tal Liron:
>>>> You may download the latest release of Prudence, run it and bombard
>>>> it with hits (use ab or a similar tool):
>>>>
>>>> http://threecrickets.com/prudence/download/
>>>>
>>>> To get the GC logs, start it like so:
>>>>
>>>> JVM_SWITCHES=\
>>>>     -Xloggc:/full/path/to/logs/gc.log \
>>>>     -XX:+PrintGCDetails \
>>>>     -XX:+PrintTenuringDistribution \
>>>>     sincerity start prudence
>>>>
>>>> To bombard it:
>>>>
>>>> ab -n 50000 -c 10 "http://localhost:8080/prudence-example/"
>>>>
>>>> Of course, you may also want to restrict the JVM heap size so it will
>>>> happen sooner. I think. I actually don't understand JVM 8 GC at all,
>>>> but you guys do, so have a go. All I can tell you is that I have a
>>>> server running live on the Internet, which I have to restart every 3
>>>> days due to this issue.
>>>>
>>>> Unfortunately, I don't have an easy way to isolate the problem to
>>>> something smaller. However, I would think there's probably an
>>>> advantage in using something as big as possible -- you can probably
>>>> get very rich dumps of what is polluting the heap.
>>>>
>>>>
>>>> On 01/10/2014 12:00 AM, Marcus Lagergren wrote:
>>>>> Tal - The GC people 10 meters behind me want to know if you have a
>>>>> repro of your full GC to death problem that they can look at?
>>>>> They’re interested.
>>>>>
>>>>> /M
>>>>>
>>>>> On 09 Jan 2014, at 16:29, Kirk Pepperdine <kirk at kodewerk.com> wrote:
>>>>>
>>>>>> Hi Marcus,
>>>>>>
>>>>>> Looks like some of the details have been chopped off. Is there a GC
>>>>>> log available? If there is a problem with MethodHandle a work
>>>>>> around might be a simple as expanding perm.. but wait, this is meta
>>>>>> space now and it should grow as long as your system has memory to
>>>>>> give to the process. The only thing I can suggest is that the space
>>>>>> to hold compressed class pointers is a fixed size and that if
>>>>>> Nashorn is loading a lot of classes is that you consider making
>>>>>> that space larger. Full disclosure, this isn’t something that I’ve
>>>>>> had a chance to dabble with but I think there is a flag to control
>>>>>> the size of that space. Maybe Colleen can offer better insight.
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>
>>>>>> On Jan 9, 2014, at 10:02 AM, Marcus Lagergren
>>>>>> <marcus.lagergren at oracle.com> wrote:
>>>>>>
>>>>>>> This almost certainly stems from the implementation from
>>>>>>> MethodHandle combinators being implemented as lambda forms as
>>>>>>> anonymous java classes. One of the things that is being done for
>>>>>>> 8u20 is to drastically reduce the number of lambda forms created.
>>>>>>> I don’t know of any workaround at the moment. CC:ing
>>>>>>> hotspot-compiler-dev, so the people there can elaborate a bit.
>>>>>>>
>>>>>>> /M
>>>>>>>
>>>>>>> On 06 Jan 2014, at 06:57, Benjamin Sieffert
>>>>>>> <benjamin.sieffert at metrigo.de> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> we have been observing similar symptoms from 7u40 onwards (using
>>>>>>>> nashorn-backport with j7 -- j8 has the same problems as 7u40 and
>>>>>>>> 7u45...
>>>>>>>> 7u25 is the last version that works fine) and suspect the cause
>>>>>>>> to be the
>>>>>>>> JSR-292 changes that took place there. Iirc I already asked over
>>>>>>>> on their
>>>>>>>> mailing list. Here's the link:
>>>>>>>> http://mail.openjdk.java.net/pipermail/mlvm-dev/2013-December/005586.html 
>>>>>>>>
>>>>>>>>
>>>>>>>> The fault might as well lie with nashorn, though. It's certainly
>>>>>>>> worth
>>>>>>>> investigating.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014/1/4 Tal Liron <tal.liron at threecrickets.com>
>>>>>>>>
>>>>>>>>> Thanks! I didn't know of these. I'm not sure how to read the
>>>>>>>>> log, but this
>>>>>>>>> doesn't look so good. I get a lot of "allocation failures" that
>>>>>>>>> look like
>>>>>>>>> this:
>>>>>>>>>
>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM (25.0-b63) for linux-amd64 JRE
>>>>>>>>> (1.8.0-ea-b121), built on Dec 19 2013 17:29:18 by "java_re" with
>>>>>>>>> gcc 4.3.0
>>>>>>>>> 20080428 (Red Hat 4.3.0-8)
>>>>>>>>> Memory: 4k page, physical 2039276k(849688k free), swap
>>>>>>>>> 262140k(256280k
>>>>>>>>> free)
>>>>>>>>> CommandLine flags: -XX:InitialHeapSize=32628416
>>>>>>>>> -XX:MaxHeapSize=522054656
>>>>>>>>> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>>>>>>>> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers
>>>>>>>>> -XX:+UseCompressedOops -XX:+UseParallelGC
>>>>>>>>> 0.108: [GC (Allocation Failure)
>>>>>>>>> Desired survivor size 524288 bytes, new threshold 7 (max 15)
>>>>>>>>> [PSYoungGen: 512K->496K(1024K)] 512K->496K(32256K), 0.0013194 
>>>>>>>>> secs]
>>>>>>>>> [Times: user=0.01 sys=0.00, real=0.00 secs]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 01/04/2014 10:02 PM, Ben Evans wrote:
>>>>>>>>>
>>>>>>>>>> -Xloggc:<pathtofile> -XX:+PrintGCDetails
>>>>>>>>>> -XX:+PrintTenuringDistribution
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Benjamin Sieffert
>>>>>>>> metrigo GmbH
>>>>>>>> Sternstr. 106
>>>>>>>> 20357 Hamburg
>>>>>>>>
>>>>>>>> Geschäftsführer: Christian Müller, Tobias Schlottke, Philipp
>>>>>>>> Westermeyer
>>>>>>>> Die Gesellschaft ist eingetragen beim Registergericht Hamburg
>>>>>>>> Nr. HRB 120447.
>>>>
>>>
>>