take 2 on 4 promising Graal optimizations

Fri Jan 31 05:34:34 PST 2014

Hey Miguel,

some more comments inline…

- Lukas

On 29 Jan 2014, at 21:40, Garcia Gutierrez Miguel Alfredo <miguelalfredo.garcia at epfl.ch> wrote:

> Hi,
> 
> A selection of four "promising" optimizations appears below. With enough feedback, I guess I could start with prototyping (hint, hint) (this is a refined version incorporating feedback from the previous discussion at http://mail.openjdk.java.net/pipermail/graal-dev/2014-January/001483.html ).
> 
> Not sure whether to place them in a dedicated phase or to modify existing phases. In any case, these optimizations can run in HighTier. One of them requires inlining to have run (because it acts on TypeSwitchNodes).
> 
> Here they are:
> 
> 
> 1) an indirect invocation that resolves to a (usually inherited) final method is updated to InvokeSpecial or InvokeStatic such final method
>  (the stamp of the receiver need not be exact for such final target to be found).
I think that the canonicalization of MethodCallTargetNode should do this - see InliningTest.invokeFinalMethodSnippet.

> 2) InvokeSpecial or InvokeStatic targeting a forwarder method is rewritten into Invoke of the forwardee
> 
>  2.a) At first glance, InliningPhase already does this. However, the thresholds SmallCompiledLowLevelGraphSize and TrivialInliningSize are used in  GreedyInliningPolicy::isWorthInlining() without "discounting" the usages of ParameterNode. When inlining a forwarder, such discount seems fair, so as not to penalize methods with a large number of params. For example, TrivialInliningSize stands at 10 now, yet forwarders exist with at least that many  parameters. Things like scala.Function10 come to mind. And scala.Function20, too.
Yes, that’s true for Parameters, and also for other types of nodes.
It’s unfair that a method which synchonizes on an object and returns a newly created object is assigned the same cost as a method that just increments a field.
Both are the same number of nodes, but the former one will end up generating ~50x the code.

>  2.b) Further motivation: in Scala bytecode both adapting and non-adapting forwarders are common.
> 
>       - The adapting variety, besides invoking the forwardee, also does (un)boxing or (down/up) casting; such forwarders result mostly from the "specialize" compiler phase of scalac (in effect adapting actuals-to-formals much the way MethodHandle.invoke() works, similarly for the conversion from forwardee-return-type to forwarder-return-type).
> 
>       - The non-adapting variety results from mixin-compilation, details at "The Scala Compiler Corner" http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/ . But you get the idea.
Reliably detecting such adapting methods would be great, both in terms of performance and in terms of more reliable inlining decisions.

>  2.c) Actually all the arguments about forwarder-forwardee also seem valid for invoke(), invokeExact(), on a constant MethodHandle. But that appears to be the realm of AbstractMethodHandleNode.
AFAIK, these should have special treatment during inlining anyway. Not sure how much of that is implemented, though.

> 3) collapsing of TypeSwitchNode where different type-hubs all "resolve" to the same (inherited) final-method.
> 
>  Such "collapsing" consists in emitting one or more (instanceof Class-defining-final-method) to do away with a bunch of redundant type-hubs comparisons, leaving a TypeSwitchNode for the residual cases. Caveat: I haven't yet looked at how a SwitchStrategy works, nor how to simplify TypeSwitchNodes. Perhaps rather than collapsing after the fact, the collapsed version could be emitted from the start.
Another point to consider here is that some of the inlined target methods might actually consist of the same code - so they could be shared.
That would lose some type precision, though.

> 4) InvokeSpecial or InvokeStatic targeting a method for which a more specific "actual return type" can be determined than the (usually "inherited") declared return type. In case such callsite is inlined, the more precise type is used. However, even without inlining, it's still possible to downcast at the callsite (a downcast that always succeeds, see below).
> 
>  For example, a method that returns a fresh object yet the declared return type is an interface.
> 
>  To discover such "actual return type", at first I thought about inspecting the graph of the target method. Previous feedback (thanks!) brought up the fact such graph may elide paths not yet taken (with such paths potentially containing returns, or contributing inputs to the phi-nodes of return-values). If I understand correctly, true == OptimisticOptimizations::removeNeverExecutedCode() is a necessary precondition for that outcome, right? Any other precondition out there that I might be overlooking?
The GraphBuilder is not built for precise analyses of bytecode methods - consider this example:

Object foo (int a) {
	try { return Integer.valueOf(1000 / a); } catch (ArithmeticException e) { return new Error(); }
}

The GraphBuilder never generates code to handle the division by zero, so the exception handler will never be compiled, even with all optimistic optimizations disabled.
Therefore, information about the return type would need to be checked before being used.

>  In view of the above, the following steps conservatively determine the actual return type of a method being compiled:
> 
>     4.a) At GraphBuilderPhase time,
> 
>            if removeNeverExecutedCode() is enabled, and one or more branches are skipped because of it, then the precision of the declared return type is deemed as "can't be improved". Otherwise an attempt is made (just like for the subcase below) to come up with a more precise actual return type.
> 
>            if removeNeverExecutedCode() is disabled, the least-upper-bound of the stamps of all ReturnNodes is what we're looking for.
> 
>     4.b) It's tempting to place such more-precise-actual-return-type, if found, under a key in context.getGraphCache(). The conservative scheme above is safe across recompilations, although upon recompilation it might be further improved. On the other hand, if the more precise type is already exact, no recompilation will improve it.
> 
> A (somewhat) related question: recompilation (eg as triggered via InvalidateRecompile) always involves running GraphBuilderPhase anew, right?
Yes, most of the time recompilation happens because some assumption taken based on profiling information is not true any more.
The graph is re-built every time to make sure it reflects the current profiling information.

> Another question: does a CheckCast prevent an object from being scalar-replaced? (if so, that would diminish the appeal of this optimization).
No, since the exact type is known during scalar replacement, a CheckCast will just be removed if the type fits.
This behavior is encoded in the CheckCastNode.virtualize method.