take 2 on 4 promising Graal optimizations

Wed Jan 29 12:40:13 PST 2014

Hi,

A selection of four "promising" optimizations appears below. With enough feedback, I guess I could start with prototyping (hint, hint) (this is a refined version incorporating feedback from the previous discussion at http://mail.openjdk.java.net/pipermail/graal-dev/2014-January/001483.html ).

Not sure whether to place them in a dedicated phase or to modify existing phases. In any case, these optimizations can run in HighTier. One of them requires inlining to have run (because it acts on TypeSwitchNodes).

Here they are:

1) an indirect invocation that resolves to a (usually inherited) final method
   is updated to InvokeSpecial or InvokeStatic such final method
   (the stamp of the receiver need not be exact for such final target to be found).

2) InvokeSpecial or InvokeStatic targeting a forwarder method is rewritten into Invoke of the forwardee

   2.a) At first glance, InliningPhase already does this. However, the thresholds SmallCompiledLowLevelGraphSize and TrivialInliningSize are used in  GreedyInliningPolicy::isWorthInlining() without "discounting" the usages of ParameterNode. When inlining a forwarder, such discount seems fair, so as not to penalize methods with a large number of params. For example, TrivialInliningSize stands at 10 now, yet forwarders exist with at least that many  parameters. Things like scala.Function10 come to mind. And scala.Function20, too.

   2.b) Further motivation: in Scala bytecode both adapting and non-adapting forwarders are common.

        - The adapting variety, besides invoking the forwardee, also does (un)boxing or (down/up) casting; such forwarders result mostly from the "specialize" compiler phase of scalac (in effect adapting actuals-to-formals much the way MethodHandle.invoke() works, similarly for the conversion from forwardee-return-type to forwarder-return-type).

        - The non-adapting variety results from mixin-compilation, details at "The Scala Compiler Corner" http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/ . But you get the idea.

   2.c) Actually all the arguments about forwarder-forwardee also seem valid for invoke(), invokeExact(), on a constant MethodHandle. But that appears to be the realm of AbstractMethodHandleNode.

3) collapsing of TypeSwitchNode where different type-hubs all "resolve" to the same (inherited) final-method.

   Such "collapsing" consists in emitting one or more (instanceof Class-defining-final-method) to do away with a bunch of redundant type-hubs comparisons, leaving a TypeSwitchNode for the residual cases. Caveat: I haven't yet looked at how a SwitchStrategy works, nor how to simplify TypeSwitchNodes. Perhaps rather than collapsing after the fact, the collapsed version could be emitted from the start.

4) InvokeSpecial or InvokeStatic targeting a method for which a more specific "actual return type" can be determined than the (usually "inherited") declared return type. In case such callsite is inlined, the more precise type is used. However, even without inlining, it's still possible to downcast at the callsite (a downcast that always succeeds, see below).

   For example, a method that returns a fresh object yet the declared return type is an interface.

   To discover such "actual return type", at first I thought about inspecting the graph of the target method. Previous feedback (thanks!) brought up the fact such graph may elide paths not yet taken (with such paths potentially containing returns, or contributing inputs to the phi-nodes of return-values). If I understand correctly, true == OptimisticOptimizations::removeNeverExecutedCode() is a necessary precondition for that outcome, right? Any other precondition out there that I might be overlooking?

   In view of the above, the following steps conservatively determine the actual return type of a method being compiled:

      4.a) At GraphBuilderPhase time,

             if removeNeverExecutedCode() is enabled, and one or more branches are skipped because of it, then the precision of the declared return type is deemed as "can't be improved". Otherwise an attempt is made (just like for the subcase below) to come up with a more precise actual return type.

             if removeNeverExecutedCode() is disabled, the least-upper-bound of the stamps of all ReturnNodes is what we're looking for.

      4.b) It's tempting to place such more-precise-actual-return-type, if found, under a key in context.getGraphCache(). The conservative scheme above is safe across recompilations, although upon recompilation it might be further improved. On the other hand, if the more precise type is already exact, no recompilation will improve it.

A (somewhat) related question: recompilation (eg as triggered via InvalidateRecompile) always involves running GraphBuilderPhase anew, right?

Another question: does a CheckCast prevent an object from being scalar-replaced? (if so, that would diminish the appeal of this optimization).

As with all optimizations, "it would be great" to get them right before implementation. Feedback is welcome!

Miguel

--
Miguel Garcia
Swiss Federal Institute of Technology
EPFL - IC - LAMP1 - INR 328 - Station 14
CH-1015 Lausanne - Switzerland
http://lamp.epfl.ch/~magarcia/