Optimizing Method Lookup in Truffle

Sat Aug 10 15:52:46 PDT 2013

Mark,

This is not at all limited to Java's concept of a "class". It is not even limited to a simple object comparison. In fact, anything expressible as a boolean condition in Java can serve as a guard in Truffle. Whenever you write somewhere in your node's execute methods "if (!condition) { CompilerDirectives.transferToInterpreter(); }" the condition is interpreted as a guard that causes deoptimization when false.

The only constraint is that the code following the call to CompilerDirectives.transferToInterpreter() [1] should make sure that subsequent execution does not periodically hit the if branch. One possible way to do that is via replacing the node with a different node that no longer relies on the condition being true and instead performs a more generic version of the operation. The node replacement can be done via the Node.replace(Node) method [2]. Because a node replacement always implies deoptimization, it is in the above example actually enough to write "if (!condition) { this.replace(createMoreGenericNode(this)); }" - this is a common pattern in Truffle interpreters and also used for the generated code from the Truffle DSL [3].

- thomas

[1] http://lafo.ssw.uni-linz.ac.at/javadoc/graalvm/com.oracle.truffle.api/javadoc/com/oracle/truffle/api/CompilerDirectives.html#transferToInterpreter()
[2] http://lafo.ssw.uni-linz.ac.at/javadoc/graalvm/com.oracle.truffle.api/javadoc/com/oracle/truffle/api/nodes/Node.html#replace(T)
[3] http://lafo.ssw.uni-linz.ac.at/javadoc/graalvm/com.oracle.truffle.api.dsl/javadoc/com/oracle/truffle/api/dsl/package-summary.html

On Aug 10, 2013, at 10:17 PM, Mark Roos <mroos at roos.com> wrote:

> As long as by 'class' we are not limited to java's concept.  In my case the pic uses the head of the lookup chain to determine a match.  This is not equal to the class.  This is held in a field of the 'this' object which is the method target.  So i currently extract the lookup and then apply the guard to it.
> 
> -Mark
> 
> On Aug 10, 2013, at 3:54 AM, "Thomas Wuerthinger" <thomas.wuerthinger at oracle.com> wrote:
> 
> > Stefan,
> > 
> > One of the first steps towards speeding up the implementation is to cache information in the nodes that are useful for subsequent execution. The first candidate for this would be caching the receiverClass and the looked up Invokable instance in the MessageNode class. For this, you need two types of guards:
> > 
> > a) A dynamic guard that checks for subsequent executions that the receiverClass is still the same (the first uninitialised message send node would only get the receiverClass and then rewrite itself to a specialized message send node with the dynamic guard). If it is not (i.e., it is a polymorphic invocation at this position), you can rewrite the node to a generic MessageNode object.
> > 
> > b) A global assumption guard that checks that the Invokable instance cache of the given receiver contains still the same value for the given method symbol. You can implement this via a global Assumption [1] that you can create via Truffle.getRuntime().createAssumption() [2]. You can use Assumption.check() [3] to dynamically check the assumption during execution of the node (rewriting to a generic MessageNode object if the assumption fails) and you can use Assumption.invalidate() [4] when the assumption is no longer correct.
> > 
> > These changes will save one hash map lookup for each monomorphic message send and is the first step towards inlining of those message sends.
> > 
> > - thomas
> > 
> > [1] http://lafo.ssw.uni-linz.ac.at/javadoc/graalvm/com.oracle.truffle.api/javadoc/com/oracle/truffle/api/Assumption.html
> > [2] http://lafo.ssw.uni-linz.ac.at/javadoc/graalvm/com.oracle.truffle.api/javadoc/com/oracle/truffle/api/TruffleRuntime.html#createAssumption()
> > [3] http://lafo.ssw.uni-linz.ac.at/javadoc/graalvm/com.oracle.truffle.api/javadoc/com/oracle/truffle/api/Assumption.html#check()
> > [4] http://lafo.ssw.uni-linz.ac.at/javadoc/graalvm/com.oracle.truffle.api/javadoc/com/oracle/truffle/api/Assumption.html#invalidate()
> > 
> > On Aug 8, 2013, at 11:39 AM, Stefan Marr <java at stefan-marr.de> wrote:
> > 
> > > Hi Andreas:
> > > 
> > > On 07 Aug 2013, at 23:51, Andreas Woess <andreas.woess at jku.at> wrote:
> > > 
> > >> I've found the bug and I think I have a workaround for you until this is
> > >> fixed.
> > >> In ReturnNonLocalNode you have try {... throw new ReturnException ...}
> > >> catch (FrameSlotTypeException) {...}. Avoid this try-catch block around
> > >> any throw of a ControlFlowException. I suggest factoring out
> > >> "try-getObject-catch-shouldNotHappen" into a separate method -- you use
> > >> this pattern in several places.
> > > 
> > > Again, spot on. Thanks a lot.
> > > 
> > > Refactoring ReturnNonLocalNode to use methods that encapsulate the "try-getObject-catch-shouldNotHappen" solves the issue immediately.
> > > Now I get performance that is roughly on par with the 'interpreted' version that is just running on HotSpot.
> > > 
> > > Applying the pattern for all these getObject cases doesn't seem to make a significant difference. Guess for most cases the compiler is already doing the correct thing.
> > > 
> > > Some numbers below.
> > > 
> > > Thanks
> > > Stefan
> > > 
> > > With Graal
> > > ----------
> > > 
> > > $ ./mx.sh --vm server vm -G:-TraceTruffleCompilation -Xbootclasspath/a:../som/build/classes som.vm.Universe -cp ../som/Smalltalk ../som/Examples/Benchmarks/All.som
> > > Starting Fibonacci benchmark ...
> > >   Iterations: 30 (elapsed time 15393 ms)
> > >   RESULT: 59 ms
> > > 
> > > Starting Dispatch benchmark ...
> > >   Iterations: 66 (elapsed time 3005 ms)
> > >   RESULT: 32 ms
> > > 
> > > Starting Bounce benchmark ...
> > >   Iterations: 30 (elapsed time 17745 ms)
> > >   RESULT: 51 ms
> > > 
> > > Starting Loop benchmark ...
> > >   Iterations: 30 (elapsed time 3015 ms)
> > >   RESULT: 37 ms
> > > 
> > > Starting Permute benchmark ...
> > >   Iterations: 30 (elapsed time 5028 ms)
> > >   RESULT: 41 ms
> > > 
> > > Starting Queens benchmark ...
> > >   Iterations: 30 (elapsed time 9221 ms)
> > >   RESULT: 48 ms
> > > 
> > > Starting List benchmark ...
> > >   Iterations: 30 (elapsed time 6592 ms)
> > >   RESULT: 60 ms
> > > 
> > > Starting Recurse benchmark ...
> > >   Iterations: 108 (elapsed time 3008 ms)
> > >   RESULT: 20 ms
> > > 
> > > Starting Storage benchmark ...
> > >   Iterations: 30 (elapsed time 3497 ms)
> > >   RESULT: 56 ms
> > > 
> > > Starting Sieve benchmark ...
> > >   Iterations: 30 (elapsed time 3108 ms)
> > >   RESULT: 43 ms
> > > 
> > > Starting BubbleSort benchmark ...
> > >   Iterations: 30 (elapsed time 4726 ms)
> > >   RESULT: 36 ms
> > > 
> > > Starting QuickSort benchmark ...
> > >   Iterations: 30 (elapsed time 9070 ms)
> > >   RESULT: 41 ms
> > > 
> > > Starting Sum benchmark ...
> > >   Iterations: 79 (elapsed time 3000 ms)
> > >   RESULT: 35 ms
> > > 
> > > Starting Towers benchmark ...
> > >   Iterations: 30 (elapsed time 7996 ms)
> > >   RESULT: 59 ms
> > > 
> > > Starting TreeSort benchmark ...
> > >   Iterations: 30 (elapsed time 8416 ms)
> > >   RESULT: 69 ms
> > > 
> > > Starting IntegerLoop benchmark ...
> > >   Iterations: 40 (elapsed time 3041 ms)
> > >   RESULT: 69 ms
> > > 
> > > All tests completed (756 ms)
> > > 
> > > 
> > > Without Graal
> > > -------------
> > > 
> > > $ ./som.sh -cp Smalltalk Examples/Benchmarks/All.som                                                         ✹ ✭getObject-catch-idiom
> > > Starting Fibonacci benchmark ...
> > >   Iterations: 38 (elapsed time 3002 ms)
> > >   RESULT: 66 ms
> > > 
> > > Starting Dispatch benchmark ...
> > >   Iterations: 81 (elapsed time 3027 ms)
> > >   RESULT: 35 ms
> > > 
> > > Starting Bounce benchmark ...
> > >   Iterations: 78 (elapsed time 3028 ms)
> > >   RESULT: 36 ms
> > > 
> > > Starting Loop benchmark ...
> > >   Iterations: 82 (elapsed time 3033 ms)
> > >   RESULT: 35 ms
> > > 
> > > Starting Permute benchmark ...
> > >   Iterations: 77 (elapsed time 3019 ms)
> > >   RESULT: 38 ms
> > > 
> > > Starting Queens benchmark ...
> > >   Iterations: 69 (elapsed time 3005 ms)
> > >   RESULT: 41 ms
> > > 
> > > Starting List benchmark ...
> > >   Iterations: 49 (elapsed time 3025 ms)
> > >   RESULT: 57 ms
> > > 
> > > Starting Recurse benchmark ...
> > >   Iterations: 108 (elapsed time 3013 ms)
> > >   RESULT: 27 ms
> > > 
> > > Starting Storage benchmark ...
> > >   Iterations: 60 (elapsed time 3009 ms)
> > >   RESULT: 48 ms
> > > 
> > > Starting Sieve benchmark ...
> > >   Iterations: 69 (elapsed time 3018 ms)
> > >   RESULT: 41 ms
> > > 
> > > Starting BubbleSort benchmark ...
> > >   Iterations: 93 (elapsed time 3012 ms)
> > >   RESULT: 31 ms
> > > 
> > > Starting QuickSort benchmark ...
> > >   Iterations: 77 (elapsed time 3031 ms)
> > >   RESULT: 37 ms
> > > 
> > > Starting Sum benchmark ...
> > >   Iterations: 86 (elapsed time 3033 ms)
> > >   RESULT: 34 ms
> > > 
> > > Starting Towers benchmark ...
> > >   Iterations: 51 (elapsed time 3019 ms)
> > >   RESULT: 57 ms
> > > 
> > > Starting TreeSort benchmark ...
> > >   Iterations: 40 (elapsed time 3049 ms)
> > >   RESULT: 73 ms
> > > 
> > > Starting IntegerLoop benchmark ...
> > >   Iterations: 44 (elapsed time 3049 ms)
> > >   RESULT: 67 ms
> > > 
> > > All tests completed (723 ms)
> > > 
> > > 
> > > -- 
> > > Stefan Marr
> > > Software Languages Lab
> > > Vrije Universiteit Brussel
> > > Pleinlaan 2 / B-1050 Brussels / Belgium
> > > http://soft.vub.ac.be/~smarr
> > > Phone: +32 2 629 2974
> > > Fax:   +32 2 629 3525
> > > 
> > 
>