General Steps for Optimizing Truffle-based Languages?

Tue May 6 17:19:43 UTC 2014

I'm still intending and sitting down to go through the TruffleSOM code in
depth at some point, but haven't managed it yet.

In the mean time, about warmup. Your while loops are a method on a block
that calls itself (self restart), right? Is that tail recursion then? I
don't see how you're not blowing the stack. But my point was - however you
are implementing loops, is Truffle seeing the loop count, or does only the
body of the loop get optimised? In my while loop in Ruby (while loops are
like in C in Ruby - not method calls) I report the number of iterations. If
I didn't do that, warmup of whatever contains the loop would take a long
time. Maybe this doesn't apply to SOM as you have no primitive loop
operation.

Chris

On 6 May 2014 17:58, Stefan Marr <java at stefan-marr.de> wrote:

> Dear all:
>
> Just to give a little more information, here some numbers:
>   http://soft.vub.ac.be/~smarr/downloads/truffle/perf-overview.html
>
> The most important part:
>
>                        Runtime factor compared to Java8
>                         DeltaBlue Mandelbrot Richards
>  Java8                       1.00       1.00     1.00
>  JRubyTruffle                2.13       1.26     6.04
>  TruffleSOM                  1.19      15.66    28.43
>
> So, DeltaBlue is pretty good on TruffleSOM and Chris said he didn’t do
> anything yet to optimize JRuby for that one, so I guess, they’ll be closer
> in the end. Well, but I didn’t do anything in particular for DeltaBlue
> either. On the other hand, Mandelbrot and Richards are one order of
> magnitude slower.
>
> Any tips on how to go about fixing that are highly appreciate.
>
> One thing that I noticed is that JRuby warms up much faster.
> DeltaBlue takes an insane amount of time to warmup on TruffleSOM, and I
> don’t think that its just slow warmup in general.
> I guess, there might be something going on, but don’t know whether that’s
> just the inlining propagating through the tree more slowly than in JRuby.
>
> Best regards
> Stefan
>
> On 05 May 2014, at 23:40, Stefan Marr <java at stefan-marr.de> wrote:
>
> > Dear all:
> >
> > After adding support for primitive value storage in the object layout,
> similar to how it is solved in JRuby, I am slowly running out of general
> optimizations that seem to be worthwhile to apply to TruffleSOM, however,
> performance is still far from impressive.
> > While TruffleSOM outperforms it’s RPython based counterpart on many
> simple microbenchmarks, it’s not really close on larger benchmarks.
> >
> > Now I am wondering how to continue.
> >
> > For RPython, they got their trace viewer which is rather intuitive after
> a brief introduction, because the information presented is directly related
> to a trace of code and it doesn’t take a lot of guessing and experience to
> see what’s going on and what is coming from where, because it is really
> closely related to the code of the implementation. So, it is more or less
> trivial to pick all the low hanging fruit and indicate for instance things
> that should be constant with their annotations.
> >
> > For Truffle and Graal, I am just not having anywhere near enough
> experience to understand what I am seeing in the graph viewer.
> > So, I am rather lost when it comes to trying to optimize TruffleSOM on
> the same level.
> >
> > One thing I still would like to achieve is to get simple microbenchmarks
> that aren’t doing anything useful to compile to nothing.
> >
> > For example [1]:
> >    benchmark = (
> >        1 to: 20000 do: [ :i | self method: i ]
> >    )
> >    method: argument = ( ^argument )
> >
> > I would guess, it should be possible to compile that to an empty method
> body for the ‘benchmark’ method. And consequently the inner loop of the
> benchmark harness.
> > In RPython, that’s not possible, because they do not do empty loop
> detection, but just guessing, with the data-dependency-based approach in
> Graal, that should not really be a problem, right?
> > So, I was trying to understand what I see in the graph viewer for that
> benchmark but didn’t get very far.
> >
> > Also, the FAQ isn’t really helpful in terms of advise to me. It says
> things like simplify nodes further, but well, when I look at ZipPy and
> JRuby, the nodes I got do look already very simple. So, I guess, I am
> looking for more concrete advise of some sort.
> >
> > What could be a good next step to identify issues that block the
> optimizer from doing its best, and what would be the best tools that could
> help me with it?
> >
> > Thanks and best regards
> > Stefan
> >
> > [1] in a TruffleSOM repo the benchmark is executed with:
> > ./graal.sh -cp Smalltalk:Examples/Benchmarks/Richards
> Examples/Benchmarks/BenchmarkHarness.som Dispatch 1000 0 10000
> >
> >
> >
> >
> > --
> > Stefan Marr
> > INRIA Lille - Nord Europe
> > http://stefan-marr.de/research/
> >
> >
> >
>
> --
> Stefan Marr
> INRIA Lille - Nord Europe
> http://stefan-marr.de/research/
>
>
>
>