GC overhead limit exceeded

Thu Jan 9 11:36:00 PST 2014

Heap dumps enables post-mortem analysis of OOMs.

Pass -XX:+HeapDumpOnOutOfMemoryError to VM and it'll dump the heap 
before exiting or use jmap (-dump:live,format=b,file=<name> <pid>) or 
visualvm to take a snapshot from a running process.

There are a number of tools to browse the contents.

Best regards,
Vladimir Ivanov

On 1/9/14 11:01 PM, Tal Liron wrote:
> Indeed, scripts are reused in this case, though I can't guarantee that
> there isn't a bug somewhere on my end.
>
> I'm wondering if it might be triggered by another issue: Prudence
> supports an internal crontab-life feature (based on cron4j), and these
> are again Nashorn scripts being run, once a minute. You can see them in
> the log of the example application you downloaded. Then again, the same
> exact feature is leveraged with Rhino.
>
> Another idea by which I may help: when the errors occur again on my
> server, I will be happy to provide you with SSH access to it to snoop
> around. We can also run VisualVM via an SSH tunnel, it should be able to
> show us exactly which classes are not being GCed. If you think this
> would be helpful, please email my directly and we can set this up.
> However, in attempts to debug this locally the heap seems to be behaving
> well enough.
>
> On 01/10/2014 01:38 AM, Hannes Wallnoefer wrote:
>> Tal,
>>
>> I've been thowing requests at the Prudence test app for the last 20
>> minutes or so. I do see that it uses a lot of metaspace, close to 50M
>> in my case. The test app seems to load/unload 2 classes per request
>> with Rhino compared to 4 classes per request with Nashorn, which is
>> probably due to differences in bytecode generation between the two
>> engines.
>>
>> I don't yet see metaspace usage growing beyond that limit, or
>> generating GC warnings. Maybe I haven't been running it long enough.
>>
>> I'm wondering if maybe metaspace is tight from the very beginning, and
>> the GC problems are caused by spikes in load (e.g. concurrent requests)?
>>
>> Also, are you aware of new classes being generated for each request?
>> Are you evaluating script files for each request? It would be more
>> efficient to evaluate the script just once and then reuse it for
>> subsequent requests.
>>
>> Hannes
>>
>> Am 2014-01-09 17:21, schrieb Tal Liron:
>>> You may download the latest release of Prudence, run it and bombard
>>> it with hits (use ab or a similar tool):
>>>
>>> http://threecrickets.com/prudence/download/
>>>
>>> To get the GC logs, start it like so:
>>>
>>> JVM_SWITCHES=\
>>>     -Xloggc:/full/path/to/logs/gc.log \
>>>     -XX:+PrintGCDetails \
>>>     -XX:+PrintTenuringDistribution \
>>>     sincerity start prudence
>>>
>>> To bombard it:
>>>
>>> ab -n 50000 -c 10 "http://localhost:8080/prudence-example/"
>>>
>>> Of course, you may also want to restrict the JVM heap size so it will
>>> happen sooner. I think. I actually don't understand JVM 8 GC at all,
>>> but you guys do, so have a go. All I can tell you is that I have a
>>> server running live on the Internet, which I have to restart every 3
>>> days due to this issue.
>>>
>>> Unfortunately, I don't have an easy way to isolate the problem to
>>> something smaller. However, I would think there's probably an
>>> advantage in using something as big as possible -- you can probably
>>> get very rich dumps of what is polluting the heap.
>>>
>>>
>>> On 01/10/2014 12:00 AM, Marcus Lagergren wrote:
>>>> Tal - The GC people 10 meters behind me want to know if you have a
>>>> repro of your full GC to death problem that they can look at?
>>>> They’re interested.
>>>>
>>>> /M
>>>>
>>>> On 09 Jan 2014, at 16:29, Kirk Pepperdine <kirk at kodewerk.com> wrote:
>>>>
>>>>> Hi Marcus,
>>>>>
>>>>> Looks like some of the details have been chopped off. Is there a GC
>>>>> log available? If there is a problem with MethodHandle a work
>>>>> around might be a simple as expanding perm.. but wait, this is meta
>>>>> space now and it should grow as long as your system has memory to
>>>>> give to the process. The only thing I can suggest is that the space
>>>>> to hold compressed class pointers is a fixed size and that if
>>>>> Nashorn is loading a lot of classes is that you consider making
>>>>> that space larger. Full disclosure, this isn’t something that I’ve
>>>>> had a chance to dabble with but I think there is a flag to control
>>>>> the size of that space. Maybe Colleen can offer better insight.
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
>>>>> On Jan 9, 2014, at 10:02 AM, Marcus Lagergren
>>>>> <marcus.lagergren at oracle.com> wrote:
>>>>>
>>>>>> This almost certainly stems from the implementation from
>>>>>> MethodHandle combinators being implemented as lambda forms as
>>>>>> anonymous java classes. One of the things that is being done for
>>>>>> 8u20 is to drastically reduce the number of lambda forms created.
>>>>>> I don’t know of any workaround at the moment. CC:ing
>>>>>> hotspot-compiler-dev, so the people there can elaborate a bit.
>>>>>>
>>>>>> /M
>>>>>>
>>>>>> On 06 Jan 2014, at 06:57, Benjamin Sieffert
>>>>>> <benjamin.sieffert at metrigo.de> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> we have been observing similar symptoms from 7u40 onwards (using
>>>>>>> nashorn-backport with j7 -- j8 has the same problems as 7u40 and
>>>>>>> 7u45...
>>>>>>> 7u25 is the last version that works fine) and suspect the cause
>>>>>>> to be the
>>>>>>> JSR-292 changes that took place there. Iirc I already asked over
>>>>>>> on their
>>>>>>> mailing list. Here's the link:
>>>>>>> http://mail.openjdk.java.net/pipermail/mlvm-dev/2013-December/005586.html
>>>>>>>
>>>>>>> The fault might as well lie with nashorn, though. It's certainly
>>>>>>> worth
>>>>>>> investigating.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>>
>>>>>>> 2014/1/4 Tal Liron <tal.liron at threecrickets.com>
>>>>>>>
>>>>>>>> Thanks! I didn't know of these. I'm not sure how to read the
>>>>>>>> log, but this
>>>>>>>> doesn't look so good. I get a lot of "allocation failures" that
>>>>>>>> look like
>>>>>>>> this:
>>>>>>>>
>>>>>>>> Java HotSpot(TM) 64-Bit Server VM (25.0-b63) for linux-amd64 JRE
>>>>>>>> (1.8.0-ea-b121), built on Dec 19 2013 17:29:18 by "java_re" with
>>>>>>>> gcc 4.3.0
>>>>>>>> 20080428 (Red Hat 4.3.0-8)
>>>>>>>> Memory: 4k page, physical 2039276k(849688k free), swap
>>>>>>>> 262140k(256280k
>>>>>>>> free)
>>>>>>>> CommandLine flags: -XX:InitialHeapSize=32628416
>>>>>>>> -XX:MaxHeapSize=522054656
>>>>>>>> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>>>>>>> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers
>>>>>>>> -XX:+UseCompressedOops -XX:+UseParallelGC
>>>>>>>> 0.108: [GC (Allocation Failure)
>>>>>>>> Desired survivor size 524288 bytes, new threshold 7 (max 15)
>>>>>>>> [PSYoungGen: 512K->496K(1024K)] 512K->496K(32256K), 0.0013194 secs]
>>>>>>>> [Times: user=0.01 sys=0.00, real=0.00 secs]
>>>>>>>>
>>>>>>>>
>>>>>>>> On 01/04/2014 10:02 PM, Ben Evans wrote:
>>>>>>>>
>>>>>>>>> -Xloggc:<pathtofile> -XX:+PrintGCDetails
>>>>>>>>> -XX:+PrintTenuringDistribution
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Benjamin Sieffert
>>>>>>> metrigo GmbH
>>>>>>> Sternstr. 106
>>>>>>> 20357 Hamburg
>>>>>>>
>>>>>>> Geschäftsführer: Christian Müller, Tobias Schlottke, Philipp
>>>>>>> Westermeyer
>>>>>>> Die Gesellschaft ist eingetragen beim Registergericht Hamburg
>>>>>>> Nr. HRB 120447.
>>>
>>
>