Calls to native methods setCallSiteTargetNormal and defineAnonymousClass taking very long

Fri Sep 20 03:01:39 PDT 2013

On 09/20/2013 11:38 AM, Benjamin Sieffert wrote:
> Well the thing is, our maximum size (2g) for PermGen / Metaspace is 
> never even approached. In fact, if we don't specify an initial size, 
> constant gc will keep them under 200-300m (at the cost of 
> performance). If we set an initial size of, say, 700m, that boundary 
> won't be grown when the app is running, because gc is always able to 
> take a huge chunk. The only thing left that comes to mind is 
> CodeCache, which we indeed had to increase a few days ago. I will now 
> increase it again (this time by a huge amount) and we'll see if that 
> has any impact.
>
> Remi: The eval-calls are mainly nashorn-internal, the point where we 
> call into it from our own code is not shown in the stack traces as I 
> don't believe it matters. Also, what are you gonna do on a 
> ScriptEngine if not call eval() on it? ;)

You mean eval on the script or eval on the compiled script ?
http://docs.oracle.com/javase/6/docs/api/javax/script/CompiledScript.html

I don't know how Nashorn works but when I had to implements the 
javax.script API for some obscure scripting language :)
I've used two entry points in the compiler that work that way:
  - if you call eval on a script, binding are compiled as constant.
  - if you call eval on a compiled script, the binding are not constant 
because you want to reuse the script.

The relinking you see may be because nashorn think that your bindings 
are constant.

>
> Regards

Rémi

>
>
> 2013/9/20 Remi Forax <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>>
>
>     I think you have your answer in the stack traces.
>     The problem arise when you (or nashorn) call eval().
>
>     I see two solutions, either ask Nashorn guys why they create callsites
>     too often when you use eval() in javascript
>     or stop using eval() in your js script ! (if eval() is something
>     you use
>     and not something nashorn uses internally).
>
>     Rémi
>
>     On 09/20/2013 09:42 AM, Benjamin Sieffert wrote:
>     > (sending this again because I wasn't subscribed the 1st time and
>     hence
>     > it probably got spamfiltered)
>     >
>     > Hello everyone,
>     >
>     > I have been sent over her from the nashorn-dev mailing list. We're
>     > having trouble with our application getting (rather randomly) very
>     > busy with calls to the methods mentioned in the topic. On jdk8, that
>     > is. On jdk7, we're exclusively seeing setCallSiteTargetNormal. This
>     > probably has to do with LambdaForms not being in use on jdk7.
>     Anyway,
>     > we're running roughly the following setup:
>     >
>     > - application mantle in JRuby
>     > - application core in Java
>     > - at the VERY core: highly repetitive javascript run via java
>     > (switched from rhino to nashorn recently there)
>     >
>     > It's a kind of webserver handling about 2-3k reqeusts per
>     second. Each
>     > request will result in a few dozen calls to our javascript. Since we
>     > need our response times to be as low as possible, we'll not do the
>     > javascript-calls serially, but commit them all to an executor
>     running
>     > on a number of threads equal to the number of available cores
>     (usually
>     > 16 or 24 on our machines, 64bit-linux), so that ideally we'll
>     use all
>     > cores for every single request.
>     >
>     > In summary: High level of concurrency, huge number of classes and
>     > nmethods generated at runtime (both by jruby and nashorn).
>     > (Some numbers from 65+ hrs uptime. jdk8: 140.000 classes live, 1M
>     > unloaded. 800mb metaspace exhausted; jdk7: 14.000 live, 2.000
>     > unloaded. 500mb permgen exhausted)
>     >
>     > As for our actual problem, like I said, it's that sometimes all our
>     > worker threads will start to be 100% busy and, when looking at the
>     > stack traces, will all be in one of those two:
>     >
>     > 1.
>     > at
>     java.lang.invoke.MethodHandleNatives.setCallSiteTargetNormal(Native
>     > Method)
>     > at java.lang.invoke.CallSite.setTargetNormal(CallSite.java:245)
>     > at
>     java.lang.invoke.MutableCallSite.setTarget(MutableCallSite.java:154)
>     > at
>     >
>     jdk.internal.dynalink.support.AbstractRelinkableCallSite.initialize(AbstractRelinkableCallSite.java:115)
>     > at jdk.internal.dynalink.DynamicLinker.link(DynamicLinker.java:184)
>     > at
>     >
>     jdk.nashorn.internal.runtime.linker.Bootstrap.bootstrap(Bootstrap.java:126)
>     > at
>     >
>     java.lang.invoke.LambdaForm$DMH/272755642.invokeStatic_LLLI_L(LambdaForm$DMH)
>     > ... (omitting a few invocations of LambdaForm methods)
>     > at java.lang.invoke.LambdaForm$MH/839784351.invoke_MT(LambdaForm$MH)
>     > at java.lang.invoke.CallSite.makeSite(CallSite.java:283)
>     > at
>     >
>     java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:294)
>     > at jdk.nashorn.internal.scripts.Script$\^eval\_._L196(<eval>:231)
>     > ...
>     >
>     > 2.
>     > at sun.misc.Unsafe.defineAnonymousClass(Native Method)
>     > at
>     >
>     java.lang.invoke.InvokerBytecodeGenerator.loadAndInitializeInvokerClass(InvokerBytecodeGenerator.java:262)
>     > at
>     >
>     java.lang.invoke.InvokerBytecodeGenerator.loadMethod(InvokerBytecodeGenerator.java:250)
>     > at
>     >
>     java.lang.invoke.InvokerBytecodeGenerator.generateCustomizedCode(InvokerBytecodeGenerator.java:509)
>     > at
>     java.lang.invoke.LambdaForm.compileToBytecode(LambdaForm.java:454)
>     > at
>     java.lang.invoke.LambdaForm.checkInvocationCounter(LambdaForm.java:634)
>     > ... (omitting a few invocations of LambdaForm methods)
>     > at
>     java.lang.invoke.LambdaForm$LFI/1682860686.interpret_L(LambdaForm$LFI)
>     > at
>     java.lang.invoke.LambdaForm$MH/1997512181.linkToCallSite(LambdaForm$MH)
>     > at jdk.nashorn.internal.scripts.Script$\^eval\_._L196(<eval>:232)
>     > ...
>     >
>     >
>     > Obviously, the application is near unresponsive under these
>     conditions
>     > (i.e. doesn't respond to actual requests from the outside,
>     though you
>     > can connect with jconsole / visualvm / jprofiler just fine and
>     observe
>     > it). CPU will be near 100% bound by these calls, too.
>     >
>     > Now there are a few interesting things to note that might help with
>     > diagnosis.
>     > Maybe most importantly, a full gc (we're using CMS, but
>     > CMSClassUnloadingEnabled does not need to be set) will stop the
>     > misbehaviour and will return the application to its normal state.
>     > Obviously, we'd rather not run full gcs all the time, but it's maybe
>     > also interesting to note that this can indeed be triggered by
>     setting
>     > (on jdk7) MaxPermSize low enough or not at all. Apparently the
>     garbage
>     > collector will always find enough space to clean up on PermGen
>     that it
>     > doesn't really need to grow above some threshold -- though the graph
>     > showing our PermGen consumption will then look like the one showing
>     > our Heap consumption (i.e. up and down all the time) and we don't
>     > really have good response times, understandably. So this is not an
>     > option to fix the problem, but might be a clue to what's going on.
>     > A kind of additional problem in this regard is that exhausted
>     PermGen
>     > / MetaSpace will nearly stop increasing under the problematic
>     > condition. Which, in turn, makes a full gc less likely to take
>     place.
>     >
>     > Another clue is that we have been able to make the problem
>     happen less
>     > often by dialing back on the amount of reference-replacement we
>     do on
>     > our nashorn-javascript-engines. I.e. those engines have engine-wide
>     > bindings (as according to the javax.script api) you can replace just
>     > by calling .put(bindingName, object) on the engine, even while other
>     > threads are involved in calls inside it. Now apparently this causes
>     > much relinking of callsites to be done and thus fosters our problem.
>     >
>     > Still, it's hard to tell what's going on for sure, as we have no
>     stack
>     > traces for what's happening inside the native code. I hope I could
>     > give you some idea.
>     >
>     > Best regards
>     > Benjamin
>     >
>     >
>     > _______________________________________________
>     > mlvm-dev mailing list
>     > mlvm-dev at openjdk.java.net <mailto:mlvm-dev at openjdk.java.net>
>     > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>     _______________________________________________
>     mlvm-dev mailing list
>     mlvm-dev at openjdk.java.net <mailto:mlvm-dev at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
>
>
> -- 
> Benjamin Sieffert
> metrigo GmbH
> Sternstr. 106
> 20357 Hamburg
>
> Geschäftsführer: Christian Müller, Tobias Schlottke, Philipp Westermeyer
> Die Gesellschaft ist eingetragen beim Registergericht Hamburg
> Nr. HRB 120447.
>
>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev