GC overhead limit exceeded

Thu Jan 9 19:23:06 PST 2014

If you do get such a heap dump, please make .tar.gz of the same 
available to us. We could debug at our end as well.

Thanks
-Sundar

On Friday 10 January 2014 01:06 AM, Vladimir Ivanov wrote:
> Heap dumps enables post-mortem analysis of OOMs.
>
> Pass -XX:+HeapDumpOnOutOfMemoryError to VM and it'll dump the heap 
> before exiting or use jmap (-dump:live,format=b,file=<name> <pid>) or 
> visualvm to take a snapshot from a running process.
>
> There are a number of tools to browse the contents.
>
> Best regards,
> Vladimir Ivanov
>
> On 1/9/14 11:01 PM, Tal Liron wrote:
>> Indeed, scripts are reused in this case, though I can't guarantee that
>> there isn't a bug somewhere on my end.
>>
>> I'm wondering if it might be triggered by another issue: Prudence
>> supports an internal crontab-life feature (based on cron4j), and these
>> are again Nashorn scripts being run, once a minute. You can see them in
>> the log of the example application you downloaded. Then again, the same
>> exact feature is leveraged with Rhino.
>>
>> Another idea by which I may help: when the errors occur again on my
>> server, I will be happy to provide you with SSH access to it to snoop
>> around. We can also run VisualVM via an SSH tunnel, it should be able to
>> show us exactly which classes are not being GCed. If you think this
>> would be helpful, please email my directly and we can set this up.
>> However, in attempts to debug this locally the heap seems to be behaving
>> well enough.
>>
>> On 01/10/2014 01:38 AM, Hannes Wallnoefer wrote:
>>> Tal,
>>>
>>> I've been thowing requests at the Prudence test app for the last 20
>>> minutes or so. I do see that it uses a lot of metaspace, close to 50M
>>> in my case. The test app seems to load/unload 2 classes per request
>>> with Rhino compared to 4 classes per request with Nashorn, which is
>>> probably due to differences in bytecode generation between the two
>>> engines.
>>>
>>> I don't yet see metaspace usage growing beyond that limit, or
>>> generating GC warnings. Maybe I haven't been running it long enough.
>>>
>>> I'm wondering if maybe metaspace is tight from the very beginning, and
>>> the GC problems are caused by spikes in load (e.g. concurrent 
>>> requests)?
>>>
>>> Also, are you aware of new classes being generated for each request?
>>> Are you evaluating script files for each request? It would be more
>>> efficient to evaluate the script just once and then reuse it for
>>> subsequent requests.
>>>
>>> Hannes
>>>
>>> Am 2014-01-09 17:21, schrieb Tal Liron:
>>>> You may download the latest release of Prudence, run it and bombard
>>>> it with hits (use ab or a similar tool):
>>>>
>>>> http://threecrickets.com/prudence/download/
>>>>
>>>> To get the GC logs, start it like so:
>>>>
>>>> JVM_SWITCHES=\
>>>>     -Xloggc:/full/path/to/logs/gc.log \
>>>>     -XX:+PrintGCDetails \
>>>>     -XX:+PrintTenuringDistribution \
>>>>     sincerity start prudence
>>>>
>>>> To bombard it:
>>>>
>>>> ab -n 50000 -c 10 "http://localhost:8080/prudence-example/"
>>>>
>>>> Of course, you may also want to restrict the JVM heap size so it will
>>>> happen sooner. I think. I actually don't understand JVM 8 GC at all,
>>>> but you guys do, so have a go. All I can tell you is that I have a
>>>> server running live on the Internet, which I have to restart every 3
>>>> days due to this issue.
>>>>
>>>> Unfortunately, I don't have an easy way to isolate the problem to
>>>> something smaller. However, I would think there's probably an
>>>> advantage in using something as big as possible -- you can probably
>>>> get very rich dumps of what is polluting the heap.
>>>>
>>>>
>>>> On 01/10/2014 12:00 AM, Marcus Lagergren wrote:
>>>>> Tal - The GC people 10 meters behind me want to know if you have a
>>>>> repro of your full GC to death problem that they can look at?
>>>>> They’re interested.
>>>>>
>>>>> /M
>>>>>
>>>>> On 09 Jan 2014, at 16:29, Kirk Pepperdine <kirk at kodewerk.com> wrote:
>>>>>
>>>>>> Hi Marcus,
>>>>>>
>>>>>> Looks like some of the details have been chopped off. Is there a GC
>>>>>> log available? If there is a problem with MethodHandle a work
>>>>>> around might be a simple as expanding perm.. but wait, this is meta
>>>>>> space now and it should grow as long as your system has memory to
>>>>>> give to the process. The only thing I can suggest is that the space
>>>>>> to hold compressed class pointers is a fixed size and that if
>>>>>> Nashorn is loading a lot of classes is that you consider making
>>>>>> that space larger. Full disclosure, this isn’t something that I’ve
>>>>>> had a chance to dabble with but I think there is a flag to control
>>>>>> the size of that space. Maybe Colleen can offer better insight.
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>
>>>>>> On Jan 9, 2014, at 10:02 AM, Marcus Lagergren
>>>>>> <marcus.lagergren at oracle.com> wrote:
>>>>>>
>>>>>>> This almost certainly stems from the implementation from
>>>>>>> MethodHandle combinators being implemented as lambda forms as
>>>>>>> anonymous java classes. One of the things that is being done for
>>>>>>> 8u20 is to drastically reduce the number of lambda forms created.
>>>>>>> I don’t know of any workaround at the moment. CC:ing
>>>>>>> hotspot-compiler-dev, so the people there can elaborate a bit.
>>>>>>>
>>>>>>> /M
>>>>>>>
>>>>>>> On 06 Jan 2014, at 06:57, Benjamin Sieffert
>>>>>>> <benjamin.sieffert at metrigo.de> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> we have been observing similar symptoms from 7u40 onwards (using
>>>>>>>> nashorn-backport with j7 -- j8 has the same problems as 7u40 and
>>>>>>>> 7u45...
>>>>>>>> 7u25 is the last version that works fine) and suspect the cause
>>>>>>>> to be the
>>>>>>>> JSR-292 changes that took place there. Iirc I already asked over
>>>>>>>> on their
>>>>>>>> mailing list. Here's the link:
>>>>>>>> http://mail.openjdk.java.net/pipermail/mlvm-dev/2013-December/005586.html 
>>>>>>>>
>>>>>>>>
>>>>>>>> The fault might as well lie with nashorn, though. It's certainly
>>>>>>>> worth
>>>>>>>> investigating.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014/1/4 Tal Liron <tal.liron at threecrickets.com>
>>>>>>>>
>>>>>>>>> Thanks! I didn't know of these. I'm not sure how to read the
>>>>>>>>> log, but this
>>>>>>>>> doesn't look so good. I get a lot of "allocation failures" that
>>>>>>>>> look like
>>>>>>>>> this:
>>>>>>>>>
>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM (25.0-b63) for linux-amd64 JRE
>>>>>>>>> (1.8.0-ea-b121), built on Dec 19 2013 17:29:18 by "java_re" with
>>>>>>>>> gcc 4.3.0
>>>>>>>>> 20080428 (Red Hat 4.3.0-8)
>>>>>>>>> Memory: 4k page, physical 2039276k(849688k free), swap
>>>>>>>>> 262140k(256280k
>>>>>>>>> free)
>>>>>>>>> CommandLine flags: -XX:InitialHeapSize=32628416
>>>>>>>>> -XX:MaxHeapSize=522054656
>>>>>>>>> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>>>>>>>> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers
>>>>>>>>> -XX:+UseCompressedOops -XX:+UseParallelGC
>>>>>>>>> 0.108: [GC (Allocation Failure)
>>>>>>>>> Desired survivor size 524288 bytes, new threshold 7 (max 15)
>>>>>>>>> [PSYoungGen: 512K->496K(1024K)] 512K->496K(32256K), 0.0013194 
>>>>>>>>> secs]
>>>>>>>>> [Times: user=0.01 sys=0.00, real=0.00 secs]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 01/04/2014 10:02 PM, Ben Evans wrote:
>>>>>>>>>
>>>>>>>>>> -Xloggc:<pathtofile> -XX:+PrintGCDetails
>>>>>>>>>> -XX:+PrintTenuringDistribution
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Benjamin Sieffert
>>>>>>>> metrigo GmbH
>>>>>>>> Sternstr. 106
>>>>>>>> 20357 Hamburg
>>>>>>>>
>>>>>>>> Geschäftsführer: Christian Müller, Tobias Schlottke, Philipp
>>>>>>>> Westermeyer
>>>>>>>> Die Gesellschaft ist eingetragen beim Registergericht Hamburg
>>>>>>>> Nr. HRB 120447.
>>>>
>>>
>>