[Truffle] Eliminating Calls to Side-Effect Free Methods?

Wed Mar 12 15:12:45 UTC 2014

The problem with SequenceNode might be in the way that you hang onto the
final result. That mutable local might be upsetting the PE.

  @Override
  @ExplodeLoop
  public Object executeGeneric(final VirtualFrame frame) {
    Object last = expressions[0].executeGeneric(frame);

    for (int i = 1; i < expressions.length; i++) {
      last = expressions[i].executeGeneric(frame);
    }

    return last;
  }

Also, you're calling executeGeneric for all children, when you could
executeVoid for all but the last. In excuteVoid you can only do
side-effects and avoid producing a value.

This is how Ruby does it, and it does go to nothing.

    @ExplodeLoop
    @Override
    public Object execute(VirtualFrame frame) {
        for (int n = 0; n < body.length - 1; n++) {
            body[n].executeVoid(frame);
        }

        return body[body.length - 1].execute(frame);
    }

    @ExplodeLoop
    @Override
    public void executeVoid(VirtualFrame frame) {
        for (int n = 0; n < body.length; n++) {
            body[n].executeVoid(frame);
        }
    }

Chris

On 12 March 2014 13:14, Stefan Marr <java at stefan-marr.de> wrote:

> Hi:
>
> With the latest changes, TruffleSOM seems to get closer to ideal
> performance,
> however, there seem to be some general issues that do not yet work as I
> would hope/expect.
>
> I got a couple of micro benchmarks, which I would expect to be reduced to
> zero in an ideal situation.
>
> For instance the Dispatch benchmark. Essentially, it is two nested loops
> and a mono-morphic message send to a method that returns it’s argument.
>
> It looks more or less like this:
>
>     benchmark = ( 1 to: 20000 do: [ :i | self method: i ] )
>     method: argument = ( ^argument )
>
>     outerBenchmarkLoop = (
>         1 to: innerIterations do: [ self benchmark ]
>     )
>
> Now, the first tricky part is that a self send inside a loop implies the
> access to the outer lexical scope, and thereby the use of a materialized
> frame. I experimented a little with it, and it turns out that in the outer
> loop, the use of the materialized frame seems to be properly eliminated, in
> the inner loop however, it is not. As shown above, both loops are in
> separate methods, but are inlined as far as I can see in IGV.
>
> That's one of the issues. The other one is the cost of introducing a
> sequence node.
> When I increase the number of statements in the inner loop from 1 to 2, my
> compiler introduces an additional SequenceNode to hold them. I would expect
> that to be compiled away and not to imply any overhead, but its payload.
> However, going from 1 to 2 increases the runtime by more than the factor 2.
> Going to 3 or 4 statements shows then a more linear increase.
>
> But again, the main point is that these methods don't do anything but
> producing heat. So, I would really like to see them eliminated.
>
> In order to identify the culprit in IGV, I don't have enough experience
> yet. There are to many things I cannot really correlate with the input
> program.
>
> Are there perhaps known patterns I could look for to point out further
> optimization/specialization potential?
>
> Thanks a lot
> Stefan
>
> PS: to see the issue, you can do the following:
>
> git clone --recursive git at github.com:SOM-st/TruffleSOM.git
> ant jar
> mx --vm server vm -G:Dump=
>  -Xbootclasspath/a:build/classes:libs/truffle.jar som.vm.Universe  -cp
> Smalltalk:Examples/Benchmarks/DeltaBlue/
> Examples/Benchmarks/BenchmarkHarness.som Dispatch 1000 0 2000
>
> And there, in the output, the method is actually called
> #innerBenchmarkLoop as part of the benchmark harness.
> Should be the second to last thing that’s compiled after the benchmark
> completed.
>
> --
> Stefan Marr
> INRIA Lille - Nord Europe
> http://stefan-marr.de/research/
>
>
>
>