General Steps for Optimizing Truffle-based Languages?

Mon May 5 21:40:21 UTC 2014

Dear all:

After adding support for primitive value storage in the object layout, similar to how it is solved in JRuby, I am slowly running out of general optimizations that seem to be worthwhile to apply to TruffleSOM, however, performance is still far from impressive.
While TruffleSOM outperforms it’s RPython based counterpart on many simple microbenchmarks, it’s not really close on larger benchmarks.

Now I am wondering how to continue.

For RPython, they got their trace viewer which is rather intuitive after a brief introduction, because the information presented is directly related to a trace of code and it doesn’t take a lot of guessing and experience to see what’s going on and what is coming from where, because it is really closely related to the code of the implementation. So, it is more or less trivial to pick all the low hanging fruit and indicate for instance things that should be constant with their annotations.

For Truffle and Graal, I am just not having anywhere near enough experience to understand what I am seeing in the graph viewer.
So, I am rather lost when it comes to trying to optimize TruffleSOM on the same level.

One thing I still would like to achieve is to get simple microbenchmarks that aren’t doing anything useful to compile to nothing.

For example [1]:
    benchmark = (
        1 to: 20000 do: [ :i | self method: i ]
    )
    method: argument = ( ^argument )

I would guess, it should be possible to compile that to an empty method body for the ‘benchmark’ method. And consequently the inner loop of the benchmark harness.
In RPython, that’s not possible, because they do not do empty loop detection, but just guessing, with the data-dependency-based approach in Graal, that should not really be a problem, right?
So, I was trying to understand what I see in the graph viewer for that benchmark but didn’t get very far.

Also, the FAQ isn’t really helpful in terms of advise to me. It says things like simplify nodes further, but well, when I look at ZipPy and JRuby, the nodes I got do look already very simple. So, I guess, I am looking for more concrete advise of some sort.

What could be a good next step to identify issues that block the optimizer from doing its best, and what would be the best tools that could help me with it?

Thanks and best regards
Stefan

[1] in a TruffleSOM repo the benchmark is executed with:
./graal.sh -cp Smalltalk:Examples/Benchmarks/Richards Examples/Benchmarks/BenchmarkHarness.som Dispatch 1000 0 10000

-- 
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/