A certain (type of?) callsite seems to always require relinking

Wed Nov 19 16:25:32 UTC 2014

Hi Attila,
I've had the very same issue with a runtime of a proprietary language.

What I've proposed in the JSR 292 cookbook for this case is to use a 
dispatch table
https://code.google.com/p/jsr292-cookbook/source/browse/trunk/bimorphic-cache/src/jsr292/cookbook/bicache/RT.java
https://code.google.com/p/jsr292-cookbook/source/browse/trunk/bimorphic-cache/src/jsr292/cookbook/bicache/DispatchMap.java

It's a hash table which associate a Class to a method handle,
so the code avoid a linear check that you have with a tree of GWT and 
can share method handle trees between subclasses.
Obviously, there is no inlining from the callsite to a method handle chain
(the JIT can not see through the hash table).

I also think Groovy use something similar because Jochen found a bug in 
the dispatch table code last year.

cheers,
Rémi

On 11/19/2014 03:41 PM, Attila Szegedi wrote:
> Hi Benjamin,
>
> I've been thinking about this, and I believe I know what the issue might be. Unfortunately, I don't currently have a good solution for it (although I'll be thinking some more about it).
>
> Basically, call sites are linked with method handles that are guarded with a test for exact receiver type (basically obj.getClass() == X.class). Call sites further can have up to 8 methods linked into them (in a LRU fashion) in a waterfall cascade of guard-with-tests. If your call site sees more than 8 receiver types (this number is fixed right now), it'll keep relinking as it'll only remember the most recent 8.
>
> Even if you don't override the method in subclasses, we can't use a more generic guard because we can't prove that there won't ever be a new subclass that won't overload the method. Note I said overload, not override: that's not a mistake. Here's a scenario:
>
> public class A {
>      public void foo(Object o) { ... }
> }
>
> public class B extends A {
> }
>
> Now imagine a script call site "a.foo('Hello')". When it's hit with an instance of B, we'll use "a.getClass() == B.class" as the guard. Now, if you have a bunch of subclasses B1…B12 all extending A, you'll end up with 12 linkages to the same method, but all guarded with a different "a.getClass() == Bn.class" guard. Actually, as I said above, you'll end up with a call site incorporating the 8 most recently used ones, and force relinking when the 9th comes along.
>
> You could ask "what'd be the harm in linking to "A.foo(Object)" method just once with "a instanceof A" guard? The harm becomes apparent if we now define
>
> public class B extends A {
>      public void foo(String s) { ... }
> }
>
> With instanceof linkage, invocation at the call site with an instance of C would pass the guard, and invoke A.foo(Object), which is incorrect as it'd be expected to invoke C.foo(String) instead. As you can see, this is not a matter of a subclass overriding foo(Object), but rather it's a matter of the subclass overloading the "foo" name with a new signature.
>
> The only strategy we have for avoiding this is at the moment is almost always linking with exact receiver class guards :-(
>
> On a sidenote, I said "almost always" above as there's a special class of methods we can, in fact, link with "instanceof" guards on the most generic declaring superclass: methods taking zero arguments (e.g. all property getters). Since overload choice is actually per-arity, zero-argument methods can't effectively be overloaded, so for them we actually use "instanceof" guards. But, sadly, we can't use them for any other methods.
>
> One strategy to cope with the issue would be to check during linking if none of the currently known subclasses add new overloads to the method (or even, not overload it in a manner where a different method would be chosen for the static type at the call site), and if they don't, then link with a switch point representing this invariant. Then, whenever a subclass is loaded into the VM that invalidates the assumption, invalidate the switch points. Unfortunately, this strategy requires whole-VM knowledge of loaded classes, and we could only do it if we added a java.lang.instrument agent as a mandatory component in Nashorn.
>
> (Another sidenote: we're trying very hard to keep Nashorn from relying on any implementation-specific or undocumented platform or VM features; so far we have always managed to rely solely on public Java APIs also because we'd like to prove that they're sufficient for a dynamic language implementation on the JVM.)
>
> Alternatively, we could also try to prove a weaker assumption that the chosen method would always be the one invoked at the call site (e.g. the method type of the call site guarantees that there can't ever be a more specific method to invoke), but in reality this'd be quite hard and since Nashorn internally mostly only uses boolean, int, long, double, and Object as the call site signatures, it probably also wouldn't be effective (e.g. we could nearly never prove this invariant). So that's probably not worth it.
>
> As a yet another solution, we might give you a system property or other configuration means of allowing link chains longer than 8 (in your case, if you have 12 subclasses, then 12 should be enough).
>
> Sorry for not having a better answer…
>    Attila.
>
> On Nov 19, 2014, at 10:40 AM, Benjamin Sieffert <benjamin.sieffert at metrigo.de> wrote:
>
>> Hello everyone,
>>
>> it started with a peculiar obversion about our nashorn-utilising
>> application, that I made: It continues to load around a hundred new
>> anonymous classes *per second*, even without new scripts being introduced –
>> i.e. we are just running the same javascripts over and over again, with
>> different arguments.
>> So I ran the application with -tcs=miss and from what I see, eventually
>> there will be only a single call left that is producing all the output and
>> therefor, I believe, all the memory load. (Am I correct in this assumption?)
>>
>> What I can say about the call is the following:
>>
>> - return type is an array of differing length (but always of the same type)
>> - there are two arguments, of which the first one will always exactly match
>> the declaration, the second one is a subclass of the one used in the
>> declaration – but always the same subclass
>> - method is implemented in an abstract class
>> - receiver is one of about a dozen classes that inherit from this abstract
>> class
>> - none of the receivers overwrite the original implementation or overload
>> the method
>>
>> When I look into the trace output, there's often a bunch of
>> "TAG MISS library:212 dyn:getMethod|getProp|getElem:<methodname> …"
>> in a row, then a whole lot of
>> "TAG MISS library:212
>> dyn:call([jdk.internal.dynalink.beans.SimpleDynamicMethod …"
>> with a bit of the first one inbetween.
>>
>> Is this a known issue? Is there something I can do to alleviate the
>> problem? As it is, I might just end up implementing the whole chunk in Java
>> and be done with it, but I thought this might be worthy of some discussion.
>> If there's some important information that I have left out, I'll be glad to
>> follow up with it.
>>
>> Regards,
>> Benjamin
>>
>> -- 
>> Benjamin Sieffert
>> metrigo GmbH
>> Sternstr. 106
>> 20357 Hamburg
>>
>> Geschäftsführer: Christian Müller, Tobias Schlottke, Philipp Westermeyer,
>> Martin Rieß
>> Die Gesellschaft ist eingetragen beim Registergericht Hamburg
>> Nr. HRB 120447.