Loop Peeling in Guest Language | Was: Optimization Thresholds?

Mon Jul 14 12:35:02 UTC 2014

Hi:

Turns out, my issue from a month ago wasn’t actually related to the reflective aspects at all and the specializations I put in place to optimize them worked perfectly.

Instead, the problem turned out to be one that can be solved with loop peeling.

It seems, for some reason that is not entirely clear to me, the first iteration of the loop is different and that seems to be somehow related to reading an object field.
Either way, the micro benchmark benefits from an explicitly peeled first loop iteration implemented in the specialization node. That means, I use two DirectCallNodes for the loop body, one for the first iteration, and the second one for the other iterations [1].

Unfortunately, this form of loop peeling is not generally a win.
Kalibera et al report that it is useful in R for loops [2].
However, on my benchmark collection, I see that it reduces performance on average.

So, I was wondering what your experience with loop peeling on the level of the guest language is.
Are there perhaps some heuristics that would help to decide whether to use it or not?

Thanks
Stefan

[1] https://github.com/SOM-st/TruffleSOM/blob/master/src/som/interpreter/nodes/specialized/IntToDoMessageNode.java#L58
[2] http://dl.acm.org/citation.cfm?doid=2576195.2576205

On 09 Jun 2014, at 22:06, Stefan Marr <java at stefan-marr.de> wrote:

> Hi:
> 
> I am running into a strange issue when optimizing some reflective operations.
> Don’t think it is related to the operations themselves, but rather the way the optimizations/Graal works.
> 
> If benchmarked separately, I see, as desired, the same performance results for direct and reflective operations.
> So, I assume that my specializations are sufficient to please the optimizer.
> Concretely, that is reflective method invocation via Smalltalk’s #perform: as well as intercepting undefined methods with #doesNotUnderstand:.
> 
> However, if both reflective mechanisms are used in combination to implement something like dynamic proxies, runtime nearly doubles compared to what I would expect.
> 
> I have been experimenting with the various thresholds in TruffleCompilerOptions, but without any luck.
> Since the benchmarks are still microbenchmarks, I don’t think I am really hitting any of those anyway.
> The fully inlined tree has something like 220 nodes. That’s the number I see in the error output after setting TruffleGraphMaxNodes to a very small number. And, that’s just about 20 more than what I get reported for the non-reflective, i.e., direct benchmark.
> 
> What would be a good approach to figure out what’s going wrong here?
> 
> Thanks
> Stefan
> 
> To reporduce:
> 
> git clone --recursive https://github.com/SOM-st/TruffleSOM.git
> ant jar
> ant test
> ./graal.sh -cp Smalltalk:Examples/Benchmarks/DoesNotUnderstand Examples/Benchmarks/BenchmarkHarness.som DirectAdd 20 0 1000
> ./graal.sh -cp Smalltalk:Examples/Benchmarks/DoesNotUnderstand Examples/Benchmarks/BenchmarkHarness.som ProxyAdd 20 0 1000
> 
> -- 
> Stefan Marr
> INRIA Lille - Nord Europe
> http://stefan-marr.de/research/
> 
> 
> 

-- 
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/