GC overhead limit exceeded

Thu Jan 9 11:01:03 PST 2014

Indeed, scripts are reused in this case, though I can't guarantee that 
there isn't a bug somewhere on my end.

I'm wondering if it might be triggered by another issue: Prudence 
supports an internal crontab-life feature (based on cron4j), and these 
are again Nashorn scripts being run, once a minute. You can see them in 
the log of the example application you downloaded. Then again, the same 
exact feature is leveraged with Rhino.

Another idea by which I may help: when the errors occur again on my 
server, I will be happy to provide you with SSH access to it to snoop 
around. We can also run VisualVM via an SSH tunnel, it should be able to 
show us exactly which classes are not being GCed. If you think this 
would be helpful, please email my directly and we can set this up. 
However, in attempts to debug this locally the heap seems to be behaving 
well enough.

On 01/10/2014 01:38 AM, Hannes Wallnoefer wrote:
> Tal,
>
> I've been thowing requests at the Prudence test app for the last 20 
> minutes or so. I do see that it uses a lot of metaspace, close to 50M 
> in my case. The test app seems to load/unload 2 classes per request 
> with Rhino compared to 4 classes per request with Nashorn, which is 
> probably due to differences in bytecode generation between the two 
> engines.
>
> I don't yet see metaspace usage growing beyond that limit, or 
> generating GC warnings. Maybe I haven't been running it long enough.
>
> I'm wondering if maybe metaspace is tight from the very beginning, and 
> the GC problems are caused by spikes in load (e.g. concurrent requests)?
>
> Also, are you aware of new classes being generated for each request? 
> Are you evaluating script files for each request? It would be more 
> efficient to evaluate the script just once and then reuse it for 
> subsequent requests.
>
> Hannes
>
> Am 2014-01-09 17:21, schrieb Tal Liron:
>> You may download the latest release of Prudence, run it and bombard 
>> it with hits (use ab or a similar tool):
>>
>> http://threecrickets.com/prudence/download/
>>
>> To get the GC logs, start it like so:
>>
>> JVM_SWITCHES=\
>>     -Xloggc:/full/path/to/logs/gc.log \
>>     -XX:+PrintGCDetails \
>>     -XX:+PrintTenuringDistribution \
>>     sincerity start prudence
>>
>> To bombard it:
>>
>> ab -n 50000 -c 10 "http://localhost:8080/prudence-example/"
>>
>> Of course, you may also want to restrict the JVM heap size so it will 
>> happen sooner. I think. I actually don't understand JVM 8 GC at all, 
>> but you guys do, so have a go. All I can tell you is that I have a 
>> server running live on the Internet, which I have to restart every 3 
>> days due to this issue.
>>
>> Unfortunately, I don't have an easy way to isolate the problem to 
>> something smaller. However, I would think there's probably an 
>> advantage in using something as big as possible -- you can probably 
>> get very rich dumps of what is polluting the heap.
>>
>>
>> On 01/10/2014 12:00 AM, Marcus Lagergren wrote:
>>> Tal - The GC people 10 meters behind me want to know if you have a 
>>> repro of your full GC to death problem that they can look at? 
>>> They’re interested.
>>>
>>> /M
>>>
>>> On 09 Jan 2014, at 16:29, Kirk Pepperdine <kirk at kodewerk.com> wrote:
>>>
>>>> Hi Marcus,
>>>>
>>>> Looks like some of the details have been chopped off. Is there a GC 
>>>> log available? If there is a problem with MethodHandle a work 
>>>> around might be a simple as expanding perm.. but wait, this is meta 
>>>> space now and it should grow as long as your system has memory to 
>>>> give to the process. The only thing I can suggest is that the space 
>>>> to hold compressed class pointers is a fixed size and that if 
>>>> Nashorn is loading a lot of classes is that you consider making 
>>>> that space larger. Full disclosure, this isn’t something that I’ve 
>>>> had a chance to dabble with but I think there is a flag to control 
>>>> the size of that space. Maybe Colleen can offer better insight.
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>>> On Jan 9, 2014, at 10:02 AM, Marcus Lagergren 
>>>> <marcus.lagergren at oracle.com> wrote:
>>>>
>>>>> This almost certainly stems from the implementation from 
>>>>> MethodHandle combinators being implemented as lambda forms as 
>>>>> anonymous java classes. One of the things that is being done for  
>>>>> 8u20 is to drastically reduce the number of lambda forms created.  
>>>>> I don’t know of any workaround at the moment. CC:ing 
>>>>> hotspot-compiler-dev, so the people there can elaborate a bit.
>>>>>
>>>>> /M
>>>>>
>>>>> On 06 Jan 2014, at 06:57, Benjamin Sieffert 
>>>>> <benjamin.sieffert at metrigo.de> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> we have been observing similar symptoms from 7u40 onwards (using
>>>>>> nashorn-backport with j7 -- j8 has the same problems as 7u40 and 
>>>>>> 7u45...
>>>>>> 7u25 is the last version that works fine) and suspect the cause 
>>>>>> to be the
>>>>>> JSR-292 changes that took place there. Iirc I already asked over 
>>>>>> on their
>>>>>> mailing list. Here's the link:
>>>>>> http://mail.openjdk.java.net/pipermail/mlvm-dev/2013-December/005586.html 
>>>>>>
>>>>>> The fault might as well lie with nashorn, though. It's certainly 
>>>>>> worth
>>>>>> investigating.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>>
>>>>>> 2014/1/4 Tal Liron <tal.liron at threecrickets.com>
>>>>>>
>>>>>>> Thanks! I didn't know of these. I'm not sure how to read the 
>>>>>>> log, but this
>>>>>>> doesn't look so good. I get a lot of "allocation failures" that 
>>>>>>> look like
>>>>>>> this:
>>>>>>>
>>>>>>> Java HotSpot(TM) 64-Bit Server VM (25.0-b63) for linux-amd64 JRE
>>>>>>> (1.8.0-ea-b121), built on Dec 19 2013 17:29:18 by "java_re" with 
>>>>>>> gcc 4.3.0
>>>>>>> 20080428 (Red Hat 4.3.0-8)
>>>>>>> Memory: 4k page, physical 2039276k(849688k free), swap 
>>>>>>> 262140k(256280k
>>>>>>> free)
>>>>>>> CommandLine flags: -XX:InitialHeapSize=32628416 
>>>>>>> -XX:MaxHeapSize=522054656
>>>>>>> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>>>>>> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers
>>>>>>> -XX:+UseCompressedOops -XX:+UseParallelGC
>>>>>>> 0.108: [GC (Allocation Failure)
>>>>>>> Desired survivor size 524288 bytes, new threshold 7 (max 15)
>>>>>>> [PSYoungGen: 512K->496K(1024K)] 512K->496K(32256K), 0.0013194 secs]
>>>>>>> [Times: user=0.01 sys=0.00, real=0.00 secs]
>>>>>>>
>>>>>>>
>>>>>>> On 01/04/2014 10:02 PM, Ben Evans wrote:
>>>>>>>
>>>>>>>> -Xloggc:<pathtofile> -XX:+PrintGCDetails 
>>>>>>>> -XX:+PrintTenuringDistribution
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Benjamin Sieffert
>>>>>> metrigo GmbH
>>>>>> Sternstr. 106
>>>>>> 20357 Hamburg
>>>>>>
>>>>>> Geschäftsführer: Christian Müller, Tobias Schlottke, Philipp 
>>>>>> Westermeyer
>>>>>> Die Gesellschaft ist eingetragen beim Registergericht Hamburg
>>>>>> Nr. HRB 120447.
>>
>