Calls to native methods setCallSiteTargetNormal and defineAnonymousClass taking very long

Fri Sep 20 02:38:22 PDT 2013

Well the thing is, our maximum size (2g) for PermGen / Metaspace is never
even approached. In fact, if we don't specify an initial size, constant gc
will keep them under 200-300m (at the cost of performance). If we set an
initial size of, say, 700m, that boundary won't be grown when the app is
running, because gc is always able to take a huge chunk. The only thing
left that comes to mind is CodeCache, which we indeed had to increase a few
days ago. I will now increase it again (this time by a huge amount) and
we'll see if that has any impact.

Remi: The eval-calls are mainly nashorn-internal, the point where we call
into it from our own code is not shown in the stack traces as I don't
believe it matters. Also, what are you gonna do on a ScriptEngine if not
call eval() on it? ;)

Regards

2013/9/20 Remi Forax <forax at univ-mlv.fr>

> I think you have your answer in the stack traces.
> The problem arise when you (or nashorn) call eval().
>
> I see two solutions, either ask Nashorn guys why they create callsites
> too often when you use eval() in javascript
> or stop using eval() in your js script ! (if eval() is something you use
> and not something nashorn uses internally).
>
> Rémi
>
> On 09/20/2013 09:42 AM, Benjamin Sieffert wrote:
> > (sending this again because I wasn't subscribed the 1st time and hence
> > it probably got spamfiltered)
> >
> > Hello everyone,
> >
> > I have been sent over her from the nashorn-dev mailing list. We're
> > having trouble with our application getting (rather randomly) very
> > busy with calls to the methods mentioned in the topic. On jdk8, that
> > is. On jdk7, we're exclusively seeing setCallSiteTargetNormal. This
> > probably has to do with LambdaForms not being in use on jdk7. Anyway,
> > we're running roughly the following setup:
> >
> > - application mantle in JRuby
> > - application core in Java
> > - at the VERY core: highly repetitive javascript run via java
> > (switched from rhino to nashorn recently there)
> >
> > It's a kind of webserver handling about 2-3k reqeusts per second. Each
> > request will result in a few dozen calls to our javascript. Since we
> > need our response times to be as low as possible, we'll not do the
> > javascript-calls serially, but commit them all to an executor running
> > on a number of threads equal to the number of available cores (usually
> > 16 or 24 on our machines, 64bit-linux), so that ideally we'll use all
> > cores for every single request.
> >
> > In summary: High level of concurrency, huge number of classes and
> > nmethods generated at runtime (both by jruby and nashorn).
> > (Some numbers from 65+ hrs uptime. jdk8: 140.000 classes live, 1M
> > unloaded. 800mb metaspace exhausted; jdk7: 14.000 live, 2.000
> > unloaded. 500mb permgen exhausted)
> >
> > As for our actual problem, like I said, it's that sometimes all our
> > worker threads will start to be 100% busy and, when looking at the
> > stack traces, will all be in one of those two:
> >
> > 1.
> > at java.lang.invoke.MethodHandleNatives.setCallSiteTargetNormal(Native
> > Method)
> > at java.lang.invoke.CallSite.setTargetNormal(CallSite.java:245)
> > at java.lang.invoke.MutableCallSite.setTarget(MutableCallSite.java:154)
> > at
> >
> jdk.internal.dynalink.support.AbstractRelinkableCallSite.initialize(AbstractRelinkableCallSite.java:115)
> > at jdk.internal.dynalink.DynamicLinker.link(DynamicLinker.java:184)
> > at
> >
> jdk.nashorn.internal.runtime.linker.Bootstrap.bootstrap(Bootstrap.java:126)
> > at
> >
> java.lang.invoke.LambdaForm$DMH/272755642.invokeStatic_LLLI_L(LambdaForm$DMH)
> > ... (omitting a few invocations of LambdaForm methods)
> > at java.lang.invoke.LambdaForm$MH/839784351.invoke_MT(LambdaForm$MH)
> > at java.lang.invoke.CallSite.makeSite(CallSite.java:283)
> > at
> >
> java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:294)
> > at jdk.nashorn.internal.scripts.Script$\^eval\_._L196(<eval>:231)
> > ...
> >
> > 2.
> > at sun.misc.Unsafe.defineAnonymousClass(Native Method)
> > at
> >
> java.lang.invoke.InvokerBytecodeGenerator.loadAndInitializeInvokerClass(InvokerBytecodeGenerator.java:262)
> > at
> >
> java.lang.invoke.InvokerBytecodeGenerator.loadMethod(InvokerBytecodeGenerator.java:250)
> > at
> >
> java.lang.invoke.InvokerBytecodeGenerator.generateCustomizedCode(InvokerBytecodeGenerator.java:509)
> > at java.lang.invoke.LambdaForm.compileToBytecode(LambdaForm.java:454)
> > at
> java.lang.invoke.LambdaForm.checkInvocationCounter(LambdaForm.java:634)
> > ... (omitting a few invocations of LambdaForm methods)
> > at java.lang.invoke.LambdaForm$LFI/1682860686.interpret_L(LambdaForm$LFI)
> > at
> java.lang.invoke.LambdaForm$MH/1997512181.linkToCallSite(LambdaForm$MH)
> > at jdk.nashorn.internal.scripts.Script$\^eval\_._L196(<eval>:232)
> > ...
> >
> >
> > Obviously, the application is near unresponsive under these conditions
> > (i.e. doesn't respond to actual requests from the outside, though you
> > can connect with jconsole / visualvm / jprofiler just fine and observe
> > it). CPU will be near 100% bound by these calls, too.
> >
> > Now there are a few interesting things to note that might help with
> > diagnosis.
> > Maybe most importantly, a full gc (we're using CMS, but
> > CMSClassUnloadingEnabled does not need to be set) will stop the
> > misbehaviour and will return the application to its normal state.
> > Obviously, we'd rather not run full gcs all the time, but it's maybe
> > also interesting to note that this can indeed be triggered by setting
> > (on jdk7) MaxPermSize low enough or not at all. Apparently the garbage
> > collector will always find enough space to clean up on PermGen that it
> > doesn't really need to grow above some threshold -- though the graph
> > showing our PermGen consumption will then look like the one showing
> > our Heap consumption (i.e. up and down all the time) and we don't
> > really have good response times, understandably. So this is not an
> > option to fix the problem, but might be a clue to what's going on.
> > A kind of additional problem in this regard is that exhausted PermGen
> > / MetaSpace will nearly stop increasing under the problematic
> > condition. Which, in turn, makes a full gc less likely to take place.
> >
> > Another clue is that we have been able to make the problem happen less
> > often by dialing back on the amount of reference-replacement we do on
> > our nashorn-javascript-engines. I.e. those engines have engine-wide
> > bindings (as according to the javax.script api) you can replace just
> > by calling .put(bindingName, object) on the engine, even while other
> > threads are involved in calls inside it. Now apparently this causes
> > much relinking of callsites to be done and thus fosters our problem.
> >
> > Still, it's hard to tell what's going on for sure, as we have no stack
> > traces for what's happening inside the native code. I hope I could
> > give you some idea.
> >
> > Best regards
> > Benjamin
> >
> >
> > _______________________________________________
> > mlvm-dev mailing list
> > mlvm-dev at openjdk.java.net
> > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>

-- 
Benjamin Sieffert
metrigo GmbH
Sternstr. 106
20357 Hamburg

Geschäftsführer: Christian Müller, Tobias Schlottke, Philipp Westermeyer
Die Gesellschaft ist eingetragen beim Registergericht Hamburg
Nr. HRB 120447.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20130920/c27635bd/attachment.html