Truffle performance problems

Thomas Wuerthinger thomas.wuerthinger at oracle.com
Thu Dec 12 15:08:46 PST 2013


Dain,

We are not confused by the performance you are seeing as Truffle’s use case is the execution of expression trees with multiple smaller nodes (which capture profiling feedback) and not as a single node wrapping a complex Java method (which does not capture any profiling feedback). There is no expected performance gain from doing the latter - on the contrary, the manual specification of the inlining boundaries and the absence of Java profiling feedback can lead to performance losses. We will nevertheless investigate wether there is anything specifically wrong with Truffle’s compiler graph in your example.

- thomas


On 12 Dec 2013, at 23:16, Dain Sundstrom <dain at iq80.com> wrote:

> Hi all,
> 
> I have been experimenting with Truffle in Presto for a day now and am confused by the performance I am seeing.  
> 
> My high level goal of this experiment is to figure out how I should structure data flow in my Truffle language.  Since, I am writing the language and the only user of that language together, I have a lot of options available to me.  Specifically, I'd like to figure out if I should take a vectorized approach, a row at a time approach, or some combination of both.  Which every solution is fastest, I'll make work in the code base.
> 
> To this end, I decided to take a top down approach to Truffle (mainly because I am confident the bottom expression bits will be fast).  I started with a very simple query hand-coded in Java:  
> 
> double sum = 0;
> for (row in source) {
>   if (row passes the filter) {
>     sum += row.extendedprice * row.discount
>   }
> }
> return sum;
> 
> When I run that on my machine using 5M rows of input (all in memory), it takes ~165ms using the Graal vm (1.7.0_45) with "-server" option on my laptop.  
> 
> With the performance baseline established, my plan was to start with a single node and then start breaking it apart into more nodes without making stuff slower. So, I wrapped this same code with a single Truffle RootNode.  When I execute the same code though the Truffle call, I get the same performance until the node is compiled.  Once the node is compiled, performance drops to ~260ms.  
> 
> Now, I understand using a single node is not the point of Truffle, but I would not expect such a massive performance drop off. At this point, I'm not sure if this is a worth while exercise at all.
> 
> You can find all of the code and instructions on running it here:
> 
> https://github.com/dain/presto-truffle/tree/master
> 
> Any ideas or suggestions?
> 
> Thanks,
> 
> -dain
> 
> 
> 
> On a related note, if you leave the Truffle test running it eventually crashes with (https://gist.github.com/dain/c3a29eb81642c86f5072):
> 
> Found illegal recursive call to HotSpotMethod<Utility.recursiveAppendNumber(StringBuffer, int, int, int)>, must annotate such calls with @CompilerDirectives.SlowPath!
> 
> I've also found "java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Inlined graph is in invalid state" when executing a CallTarget in tight inner loops.  
> 



More information about the graal-dev mailing list