TruffleSOM Status Update

Stefan Marr java at stefan-marr.de
Thu Jan 16 13:04:09 PST 2014


Hi:

Since the last time, a couple of performance relevant things changed again, based on input from ZipPy and Christian.

1. Once more, I changed how lexical scoping is implemented.

  Previously, I introduced an Object array of upvalues, which gave pretty good results (still better than what I have now),
  but based on the things I saw in ZipPy, as well as the Ruby implementation, I thought, the overhead of materializing frames might not be too bad,
  and the results of using specializing frame slots might be better.

  Thus, I reintroduced the usage of materialized frames, and don’t use upvalues anymore.
  Furthermore, frames are only materialized when a block that uses its context is created.

2. Method arguments are unwrapped and written into frame.

 Based on the previous change, I also revised how arguments are handled.
 They are now copied into the frames by an extra ArgumentInitializationNode, which relies on a combination of LocalVariableWriteNodes and ArgumentReadNodes.
 This results in an equal treatment of self and other method arguments with local variables stored in the frame. Theoretically, they can now type-specialize
 based on the value. However, currently that is disabled, because it is causing to many invalidations.

2.1 Arguments Object is now final again

  As suggested, the result is that the Arguments object passed to methods consists solely of final fields. (of type Object).
  So, I hope the escape analysis can do its thing.

3. Non-local returns are implemented with dedicated nodes.

  As suggested, I factored the handling of non-local returns out of the main node.
  Now, a method body gets wrapped into a CatchNonLocalReturnNode in case the method actually defines a block
  that uses non-local returns, and thus, can be the target for it. This change gave indeed a huge performance improvement.

4. Increase independents of inlinable methods

  Methods with lexical scope are now cloned when the outer scope gets prepared for inlining.
  Thereby, I can also give them the proper independent FrameSlot objects.
  I haven’t see anything like that in Ruby or ZipPy. But, if it is necessary to keep the FrameSlots independent
  for inlined copies, this is necessary as well, I think.

5. Enable specialized nodes (#while*, #if*, #to:do:) to inline the blocks/methods they use.

  As pointed out by Christian (I think), these specialized nodes were not doing the right thing, and
  loop bodies for instance, were never inlined.

6. Propagate profiling information of loops

  Andreas suspected that to be one of my issues, but it wasn’t. The loops were already propagating the information,
  thanks to changes I saw in ZipPy.

7. Introduced ClassCheckNode for inline caches

  The check in the CachedSendNodes was pretty complex.
  I took those apart and made a trufflized node out of the check, so that is can specialize based on the value it
  needs to check for, and has ideally only one instanceof check.
  Since this is a very specific node, it ended up having its own TypeSystem and is therefore kind of separate from
  the other nodes.

So, that should be an overview of the main changes.
However, performance isn’t perfect yet.

Best regards
Stefan

-- 
Stefan Marr
INRIA Lille - Nord Europe
http://stefan-marr.de/research/





More information about the graal-dev mailing list