ZipPy status update

Christian Humer christian.humer at gmail.com
Sat Dec 21 05:53:42 PST 2013


Hi Stefan,

The Problem you are dealing with is that there is too much logic in your
Nodes. Truffle heavily relies on clean and short code that is produced from
the truffle tree to be fast. So most of the corner cases should be
specialized away and therefore should not be visible to the compiler. Use
the compiled method size which is printed after each compilation as a rough
metric for improvements in this area.

I analysed the potential performance problems in your code by using
-G:+TraceTruffleExpansion. If enabled the trace truffle expansion feature
prints the actual tree of your methods that got expanded while truffle
compilation. There is also another option -G:+TraceTruffleExpansionSource
which adds additional line numbers and source locations. This enables to
copy the output [1] into an eclipse stack trace console to easily navigate
between the methods.

I randomly took a method that got compiled when running [2] on the windows
command line. I think it was method [3]. I list a number of of minor and
major things that I've seen that should be optimized:


1) Major: (OptimizedCallTarget.java:214) Method.execute(Method.java:72)
This method calls a method #messageSendExecution which contains a lot of
control flow and allocations  [4] . Speculate on the fact that all of this
code is not required. It contains a while loop + a lot of exception
catches. I think on the fast-path it could just call the body of the method.

2) Major: Another major problem here is that you are mutating the Arguments
object using that FrameOnStackMarker [4]. Arguments should be usually
immutable. This also applies to Arguments#upValues. I am not sure if all of
this is a good idea performance wise.I think you can stick with it until
the other problems are fixed. However if you are still not getting good
performance we should revisit this thing.

3) Minor: You should also get rid of the array allocation in the Arguments
constructor. Or at least use @ExplodeLoop on it.

4) Minor: (Method.java:72) Method.initializeFrame(Method.java:110)
I am not sure, but this could be a potential problem as well. Graal may not
be able to escape analyse the array of FrameSlots because graal does not
see its allocation. I recommend you to rewrite this to an execute of an
array of WriteLocalVariableNodes. For an array of @Children nodes graal
does assume that also the contents of the array are constant and do not
escape.

5) Minor: (SequenceNode.java:41)
UnarySendNode<CachedSendNode>.executeGeneric(UnarySendNode.java:43)
In UnarySendNode, no need to mark your @Children also @CompilationFinal. In
general @CompilationFinal only helps for non final fields which are not
marked as @Child or @Children but should be considered constant for the
compiler. Please remember to call
CompilerDirectives.transferToInterpreterAndInvalidate before writing to a
@CompilationFinal field from a compiled path.

6) Major (UnarySendNode.java)
UnaryInlinedMethod.executeEvaluated(Method.java:228)
Same Problems as 1 and 4 is duplicated for all inlined method calls. This
code is repeated 5 times over the hole compiled method.

7) Minor  (KeywordSendNode.java:54)
ArgumentEvaluationNode.executeArray(ArgumentEvaluationNode.java:31)
Do not use null values. For Nil you should use a singleton instead of null.
Then you can just remove the initial check:
    if (arguments == null || arguments.length == 0) {
      return null;
    }
completely. It does not matter too much if you allocate an empty Object
array. It gets escape analysed anyway.

8) Major (SequenceNode.java:41)
BinarySendNode<WhileTrueMessageSBlockNode>.executeGeneric(BinarySendNode.java:56)
The implementation of WhileTrueMessageSBlockNode (I think of all While
nodes) does not inline the call to the While body block, which gives a
serious performance penalty of course. Instead of directly calling the
block you should use a message send as child node which invokes the while
block. This way the call/send can be inlined.

9) Minor (SequenceNode.java:41)
SelfReadNode.executeGeneric(SelfReadNode.java:12)
ContextualNode#determineOuterSelf
and ContextualNode#determineOuterArguments should use @ExplodeLoop.

Despite the open issues, I am confident that we can get this thing fast ;-)
Feel free to ask further questions on how to fix these issues.
Lets do another review iteration after you have fixed these issues.

I will write a separate email to the list regarding @ImplicitCasts.

Cheers!


[1] expansion tree
 OptimizedCallTarget.executeHelper(OptimizedCallTarget.java:213)
  (OptimizedCallTarget.java:214) Method.execute(Method.java:72)
    (Method.java:72) Method.initializeFrame(Method.java:110)
    (Method.java:73) SequenceNode.executeGeneric(SequenceNode.java:38)
      (SequenceNode.java:41)
UnarySendNode<CachedSendNode>.executeGeneric(UnarySendNode.java:43)
        (UnarySendNode.java:43)
SelfReadNode.executeGeneric(SelfReadNode.java:12)
        (UnarySendNode.java)
UnaryInlinedMethod.executeEvaluated(Method.java:228)
          (Method.java:228)
SequenceNode.executeGeneric(SequenceNode.java:38)
            (SequenceNode.java:41)
LocalVariableWriteIntNode.executeGeneric(LocalVariableNodeFactory.java:621)
              (LocalVariableNodeFactory.java:623)
IntegerLiteralDefaultNode.executeInteger(IntegerLiteralNodeFactory.java:81)
            (SequenceNode.java:41)
KeywordSendNode<CachedSendNode>.executeGeneric(KeywordSendNode.java:53)
              (KeywordSendNode.java:53)
UnarySendNode<CachedSendNode>.executeGeneric(UnarySendNode.java:43)
                (UnarySendNode.java:43)
LocalVariableReadIntNode.executeGeneric(LocalVariableNodeFactory.java:190)
                (UnarySendNode.java)
UnaryInlinedMethod.executeEvaluated(Method.java:228)
                  (Method.java:228)
BinarySendNode.executeGeneric(BinarySendNode.java:56)
                    (BinarySendNode.java:56)
IntegerLiteralDefaultNode.executeGeneric(IntegerLiteralNodeFactory.java:88)
                    (BinarySendNode.java:57)
SelfReadNode.executeGeneric(SelfReadNode.java:12)
                    (BinarySendNode.java:58)
SubtractionPrimIntNode.executeEvaluated(SubtractionPrimFactory.java:696)
                    (BinarySendNode.java)
UninitializedSendNode.executeEvaluated(BinarySendNode.java:116)
                  (Method.java:230)
UnaryInlinedMethod.initializeFrame(Method.java:215)
                (UnarySendNode.java)
UninitializedSendNode.executeEvaluated(UnarySendNode.java:98)
              (KeywordSendNode.java:54)
ArgumentEvaluationNode.executeArray(ArgumentEvaluationNode.java:31)
                (ArgumentEvaluationNode.java:38)
LocalVariableReadIntNode.executeGeneric(LocalVariableNodeFactory.java:190)
                (ArgumentEvaluationNode.java:38)
IntegerLiteralDefaultNode.executeGeneric(IntegerLiteralNodeFactory.java:88)
                (ArgumentEvaluationNode.java:38)
BlockNode.executeGeneric(BlockNode.java:1)
                  (BlockNode.java) Universe.newBlock(Universe.java:418)
              (KeywordSendNode.java:55)
KeywordInlinedMethod.executeEvaluated(Method.java:502)
                (Method.java:503)
SequenceNode.executeGeneric(SequenceNode.java:38)
                  (SequenceNode.java:41)
NonLocalVariableWriteNode.executeGeneric(NonLocalVariableNode.java:39)
                    (NonLocalVariableNode.java:39)
SelfReadNode.executeGeneric(SelfReadNode.java:12)
                  (SequenceNode.java:41)
BinarySendNode<WhileTrueMessageSBlockNode>.executeGeneric(BinarySendNode.java:56)
                    (BinarySendNode.java:56)
BlockNode.executeGeneric(BlockNode.java:1)
                      (BlockNode.java) Universe.newBlock(Universe.java:418)
                    (BinarySendNode.java:57)
BlockNode.executeGeneric(BlockNode.java:1)
                      (BlockNode.java) Universe.newBlock(Universe.java:418)
                    (BinarySendNode.java:58)
Universe.newBlock(Universe.java:418)
                    (BinarySendNode.java:58)
Universe.newBlock(Universe.java:418)
                    (BinarySendNode.java:58)
Universe.newBlock(Universe.java:418)
                  (SequenceNode.java:41)
SelfReadNode.executeGeneric(SelfReadNode.java:12)
                (Method.java:506)
KeywordInlinedMethod.initializeFrame(Method.java:482)
              (KeywordSendNode.java)
UninitializedSendNode.executeEvaluated(KeywordSendNode.java:113)
            (SequenceNode.java:41)
SelfReadNode.executeGeneric(SelfReadNode.java:12)
          (Method.java:230)
UnaryInlinedMethod.initializeFrame(Method.java:215)
        (UnarySendNode.java)
UninitializedSendNode.executeEvaluated(UnarySendNode.java:98)
      (SequenceNode.java:41)
NonLocalVariableWriteNode.executeGeneric(NonLocalVariableNode.java:39)
        (NonLocalVariableNode.java:39)
BinarySendNode<CachedSendNode>.executeGeneric(BinarySendNode.java:56)
          (BinarySendNode.java:56)
NonLocalVariableReadNode.executeGeneric(NonLocalVariableNode.java:23)
          (BinarySendNode.java:57)
IntegerLiteralDefaultNode.executeGeneric(IntegerLiteralNodeFactory.java:88)
          (BinarySendNode.java:58)
AdditionPrimIntNode.executeEvaluated(AdditionPrimFactory.java:695)
          (BinarySendNode.java)
UninitializedSendNode.executeEvaluated(BinarySendNode.java:116)


[2]
mx  --vm server vm -G:+TraceTruffleExpansion
-G:+TraceTruffleExpansionSource
-Xbootclasspath/a:../TruffleSOM/build/classes;../TruffleSOM/libs/com.oracle.truffle.api.jar;../TruffleSOM/libs/com.oracle.truffle.api.dsl.jar
som.vm.Universe -cp ../TruffleSOM/core-lib/Smalltalk
../TruffleSOM/core-lib/Examples/Benchmarks/BenchmarkHarness.som IntegerLoop
1 2 2000

[3]
    innerBenchmarkLoop = (
        | i |
        i := 0.
        [ i < innerIterations ] whileTrue: [
            self benchmark.
            i := i + 1.
        ].
    )

[4]
  protected static Object messageSendExecution(final VirtualFrame frame,
      final ExpressionNode expr) {
    FrameOnStackMarker marker =
Arguments.get(frame).getFrameOnStackMarker();
    Object result;
    boolean restart;

    do {
      restart = false;
      try {
         result = expr.executeGeneric(frame);
      } catch (ReturnException e) {
        if (!e.reachedTarget(marker)) {
          marker.frameNoLongerOnStack();
          throw e;
        } else {
          result = e.result();
        }
      } catch (RestartLoopException e) {
        restart = true;
        result  = null;
      }
    } while (restart);

    marker.frameNoLongerOnStack();
    return result;
  }


- Christian Humer


On Sat, Dec 21, 2013 at 12:59 PM, Chris Seaton <chris at chrisseaton.com>wrote:

> Ruby has very extensive use of closures (they call them blocks), and the
> materialised frame is needed probably more often than not. This is used in
> benchmarks and it's very much on the fast path.
>
> I avoid materialising the frame where I know no local variables from the
> declaring scope are used, but apart from that I haven't been worrying about
> the performance impact yet. Your idea of making your own explicit up-values
> sounds interesting, but I wonder if that's working around Truffle rather
> than working with it. You will miss out on any performance improvements to
> MaterializedFrame in the future, and you'll have more complex code.
>
> In Ruby I also plan to inline all trivial methods immediately - but at the
> moment I'm only doing it for getters, setters and all core methods. One
> problem - if you inline straight away how do you stop the number of nodes
> blowing up beyond the limit?
>
> Christian Humer will be able to give a better explanation of @TypeCheck,
> @TypeCast and @ImplicitCast, but as I understand it @ImplicitCast tells the
> DSL that it can convert from one type to another via this method at any
> point to satisfy specialisation signatures. So I think you could implement
> the same thing using @TypeCheck and @TypeCast, but the @ImplicitCast is
> often simpler and clearer.
>
> In Ruby I use @ImplicitCast to convert between RubyFixnum and int. There is
> no difference between these two types - the RubyFixnum version is just so I
> can have a real object sometimes to simplify some other code. This tells
> the DSL whenever you see a RubyFixnum, feel free to convert it to an int to
> make things work.
>
>     @ImplicitCast
>
>     public int unboxFixnum(RubyFixnum value) {
>
>         return value.getValue();
>
>     }
>
> I don't use @TypeCheck or @TypeCast anywhere in Ruby since I started using
> @ImplicitCast instead.
>
> Chris
>
>
> On 21 December 2013 07:55, Stefan Marr <java at stefan-marr.de> wrote:
>
> > Hi Wei:
> >
> > On 21 Dec 2013, at 01:45, Wei Zhang <ndrzmansn at gmail.com> wrote:
> >
> > > Only functions that close over its declaration frame (closures) use
> > > PArguments#declarationFrame. This doesn't happen so often in my
> > > benchmarks.
> > > In most cases, zippy accesses local variable using VirtualFrames,
> > > which is optimized by Truffle/Graal.
> >
> > Well, closures are much more common in Smalltalk, I think.
> >
> > >> I do wonder, what is the typical granularity of a Python method?
> > >> At least in SOM, methods seem to be rather small. Perhaps a few AST
> > nodes on average.
> > >> Could that make a difference?
> > >
> > > Zippy inlines function calls when they become hot. Inlining helps
> > > performance a lot.
> > > I strongly recommend you to apply inlining in TruffleSOM if you haven’t
> > yet.
> >
> > TruffleSOM does inlining. I only see two difference when browsing your
> > code.
> > The first should actually be a benefit for TruffleSOM: I also inline
> > trivial methods immediately,  i.e., if a method just contains a literal
> or
> > something similar, it is directly inlined without even a function call
> > overhead, only protected by a polymorphic inline cache check node.
> > The second difference I see is that you have a `prepareBodyNode()`
> > operation that rewrites local variable access nodes. Why is that
> necessary?
> >
> > >
> > >> Another detail I noticed is that you are using another strategy for
> > type handling in the type system. You use a combination of @TypeCheck and
> > @TypeCast, while I use @ImplicitCast.
> > >> What is the difference of these two approaches?
> > >
> > > I use @TypeCheck and @TypeCast to customize type checks and
> > > conversions in ZipPy. I just followed the SimpleLanguage examples in
> > > Truffle API.
> > > @ImplicitCast is new to me. I don't know how to use it, since there
> > > isn’t any related document or example.
> >
> > There is a tiny example in SimpleLanguage… That’s where I took it from.
> > But I have the feeling the generated code when using @TypeCheck and
> > @TypeCast looks quite a bit simpler.
> >
> > Thanks for the comments
> > Stefan
> >
> > --
> > Stefan Marr
> > Software Languages Lab
> > Vrije Universiteit Brussel
> > Pleinlaan 2 / B-1050 Brussels / Belgium
> > http://soft.vub.ac.be/~smarr
> > Phone: +32 2 629 2974
> > Fax:   +32 2 629 3525
> >
> >
>


More information about the graal-dev mailing list