Calls to native methods setCallSiteTargetNormal and defineAnonymousClass taking very long

Fri Sep 20 01:12:44 PDT 2013

Am 20.09.2013 09:42, schrieb Benjamin Sieffert:
[...]
> - at the VERY core: highly repetitive javascript run via java (switched
> from rhino to nashorn recently there)
[...]
> In summary: High level of concurrency, huge number of classes and
> nmethods generated at runtime (both by jruby and nashorn).
> (Some numbers from 65+ hrs uptime. jdk8: 140.000 classes live, 1M
> unloaded. 800mb metaspace exhausted; jdk7: 14.000 live, 2.000 unloaded.
> 500mb permgen exhausted)
[...]
> 1.
> at java.lang.invoke.MethodHandleNatives.setCallSiteTargetNormal(Native
> Method)
> at java.lang.invoke.CallSite.setTargetNormal(CallSite.java:245)
> at java.lang.invoke.MutableCallSite.setTarget(MutableCallSite.java:154)
> at
> jdk.internal.dynalink.support.AbstractRelinkableCallSite.initialize(AbstractRelinkableCallSite.java:115)
> at jdk.internal.dynalink.DynamicLinker.link(DynamicLinker.java:184)
> at
> jdk.nashorn.internal.runtime.linker.Bootstrap.bootstrap(Bootstrap.java:126)
> at
> java.lang.invoke.LambdaForm$DMH/272755642.invokeStatic_LLLI_L(LambdaForm$DMH)
> ... (omitting a few invocations of LambdaForm methods)
> at java.lang.invoke.LambdaForm$MH/839784351.invoke_MT(LambdaForm$MH)
> at java.lang.invoke.CallSite.makeSite(CallSite.java:283)
> at
> java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:294)
> at jdk.nashorn.internal.scripts.Script$\^eval\_._L196(<eval>:231)
> ...

I don't know nashorn so much, but producing a new call site every time, 
does look problematic to me.

> 2.
> at sun.misc.Unsafe.defineAnonymousClass(Native Method)
> at
> java.lang.invoke.InvokerBytecodeGenerator.loadAndInitializeInvokerClass(InvokerBytecodeGenerator.java:262)
> at
> java.lang.invoke.InvokerBytecodeGenerator.loadMethod(InvokerBytecodeGenerator.java:250)
> at
> java.lang.invoke.InvokerBytecodeGenerator.generateCustomizedCode(InvokerBytecodeGenerator.java:509)
> at java.lang.invoke.LambdaForm.compileToBytecode(LambdaForm.java:454)
> at java.lang.invoke.LambdaForm.checkInvocationCounter(LambdaForm.java:634)
> ... (omitting a few invocations of LambdaForm methods)
> at java.lang.invoke.LambdaForm$LFI/1682860686.interpret_L(LambdaForm$LFI)
> at java.lang.invoke.LambdaForm$MH/1997512181.linkToCallSite(LambdaForm$MH)
> at jdk.nashorn.internal.scripts.Script$\^eval\_._L196(<eval>:232)
> ...

I think this is related to the first.

> Obviously, the application is near unresponsive under these conditions
> (i.e. doesn't respond to actual requests from the outside, though you
> can connect with jconsole / visualvm / jprofiler just fine and observe
> it). CPU will be near 100% bound by these calls, too.

continously setting new callsite targets is very bad for performance

> Now there are a few interesting things to note that might help with
> diagnosis.
> Maybe most importantly, a full gc (we're using CMS, but
> CMSClassUnloadingEnabled does not need to be set) will stop the
> misbehaviour and will return the application to its normal state.
> Obviously, we'd rather not run full gcs all the time, but it's maybe
> also interesting to note that this can indeed be triggered by setting
> (on jdk7) MaxPermSize low enough or not at all. Apparently the garbage
> collector will always find enough space to clean up on PermGen that it
> doesn't really need to grow above some threshold -- though the graph
> showing our PermGen consumption will then look like the one showing our
> Heap consumption (i.e. up and down all the time) and we don't really
> have good response times, understandably. So this is not an option to
> fix the problem, but might be a clue to what's going on.
> A kind of additional problem in this regard is that exhausted PermGen /
> MetaSpace will nearly stop increasing under the problematic condition.
> Which, in turn, makes a full gc less likely to take place.

so the conditions happens due to low memory. My guess is something is 
softreferenced (or maybe weak) and gets released only (for unknown 
reasons to me) in a full GC run. The characteristics should be, that 
after a while the application starts misbehaving again.

> Another clue is that we have been able to make the problem happen less
> often by dialing back on the amount of reference-replacement we do on
> our nashorn-javascript-engines. I.e. those engines have engine-wide
> bindings (as according to the javax.script api) you can replace just by
> calling .put(bindingName, object) on the engine, even while other
> threads are involved in calls inside it. Now apparently this causes much
> relinking of callsites to be done and thus fosters our problem.
>
> Still, it's hard to tell what's going on for sure, as we have no stack
> traces for what's happening inside the native code. I hope I could give
> you some idea.

I think it is not the native code, rather than what nashorn does with 
the call sites. But I am not really familiar with nashorn.

bye Jochen

-- 
Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org