General Steps for Optimizing Truffle-based Languages?

Tue May 6 16:58:18 UTC 2014

Dear all:

Just to give a little more information, here some numbers:
  http://soft.vub.ac.be/~smarr/downloads/truffle/perf-overview.html

The most important part:

                       Runtime factor compared to Java8                    
			DeltaBlue Mandelbrot Richards
 Java8                       1.00       1.00     1.00
 JRubyTruffle                2.13       1.26     6.04
 TruffleSOM                  1.19      15.66    28.43

So, DeltaBlue is pretty good on TruffleSOM and Chris said he didn’t do anything yet to optimize JRuby for that one, so I guess, they’ll be closer in the end. Well, but I didn’t do anything in particular for DeltaBlue either. On the other hand, Mandelbrot and Richards are one order of magnitude slower.

Any tips on how to go about fixing that are highly appreciate.

One thing that I noticed is that JRuby warms up much faster.
DeltaBlue takes an insane amount of time to warmup on TruffleSOM, and I don’t think that its just slow warmup in general.
I guess, there might be something going on, but don’t know whether that’s just the inlining propagating through the tree more slowly than in JRuby.

Best regards
Stefan

On 05 May 2014, at 23:40, Stefan Marr <java at stefan-marr.de> wrote:

> Dear all:
> 
> After adding support for primitive value storage in the object layout, similar to how it is solved in JRuby, I am slowly running out of general optimizations that seem to be worthwhile to apply to TruffleSOM, however, performance is still far from impressive.
> While TruffleSOM outperforms it’s RPython based counterpart on many simple microbenchmarks, it’s not really close on larger benchmarks.
> 
> Now I am wondering how to continue.
> 
> For RPython, they got their trace viewer which is rather intuitive after a brief introduction, because the information presented is directly related to a trace of code and it doesn’t take a lot of guessing and experience to see what’s going on and what is coming from where, because it is really closely related to the code of the implementation. So, it is more or less trivial to pick all the low hanging fruit and indicate for instance things that should be constant with their annotations.
> 
> For Truffle and Graal, I am just not having anywhere near enough experience to understand what I am seeing in the graph viewer.
> So, I am rather lost when it comes to trying to optimize TruffleSOM on the same level.
> 
> One thing I still would like to achieve is to get simple microbenchmarks that aren’t doing anything useful to compile to nothing.
> 
> For example [1]:
>    benchmark = (
>        1 to: 20000 do: [ :i | self method: i ]
>    )
>    method: argument = ( ^argument )
> 
> I would guess, it should be possible to compile that to an empty method body for the ‘benchmark’ method. And consequently the inner loop of the benchmark harness.
> In RPython, that’s not possible, because they do not do empty loop detection, but just guessing, with the data-dependency-based approach in Graal, that should not really be a problem, right?
> So, I was trying to understand what I see in the graph viewer for that benchmark but didn’t get very far.
> 
> Also, the FAQ isn’t really helpful in terms of advise to me. It says things like simplify nodes further, but well, when I look at ZipPy and JRuby, the nodes I got do look already very simple. So, I guess, I am looking for more concrete advise of some sort.
> 
> What could be a good next step to identify issues that block the optimizer from doing its best, and what would be the best tools that could help me with it?
> 
> Thanks and best regards
> Stefan
> 
> [1] in a TruffleSOM repo the benchmark is executed with:
> ./graal.sh -cp Smalltalk:Examples/Benchmarks/Richards Examples/Benchmarks/BenchmarkHarness.som Dispatch 1000 0 10000
> 
> 
> 
> 
> -- 
> Stefan Marr
> INRIA Lille - Nord Europe
> http://stefan-marr.de/research/
> 
> 
> 

-- 
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/