a discussion on inlining heuristics, anyone?

Tue Jun 10 15:46:05 UTC 2014

After a long, long trail of refactoring and documenting [1] the inliner, the time is ripe to actually improve the inlining heuristics. I've included below a few candidate ideas, please feel free to share your observations about what's working (or not) and what you'd do differently (and why) around inlining. Don't say later you weren't asked for input.

The rules of the game are: 
  (a) the inliner explores inlining candidates depth-first, propagating the additional information a caller has about arguments (in particular constants) thus specializing the callee(s) to such additional information. 
  (b) provided the thus specialized inlining candidates meet certain criteria, inlining is performed: the current "stack top" during depth-first search has one of its callsites replaced with the body of callee(s), adding in general more callsites to the "stack top"
  (c) after some rounds of the above, the stack of inlining candidates eventually collapses back to the root method on which inlining was launched.

Depth-first is good in that it minimizes the working set, the biggest trade-off being that the interplay between different inlining alternatives (for different callsites contained in the same caller) can't be taken into account. That would require "breadth-first" exploration.

The main knobs that can be turned during depth-first search are just two:

  (1) from the callsites a caller contains, one is picked to become a candidate for inlining, in CallsiteHolderExplorable.popInvoke()

  (2) each feasible target method (as given by the type-profile for the receiver) is explored before inlining (ie its method body is peeked into). Implementation-wise, an InlineableGraph is built for that target. This is the one opportunity there is to evaluate the merits of inlining such target before actually doing so. 

The difference between (1) and (2) is the information available to make the decision. Picking a callsite in (1) must rely only on knowledge about the arguments as opposed to knowing (in the general case) anything about the feasible targets; precisely those feasible targets are determined *for that particular callsite* during (2). Even with an InlineableGraph at hand we can't "see" beyond that immediate target (ie we can't explore yet the callesites it contains, that's what item 1 is about).

Still, the above offers quite some leeway. Candidates ideas:

  (1.a) a priority-queue to pick "promising" callsites first
        For example, a callsite receiving freshly instantiated objects (eg, anonymous closures) seems promising. Why? 
        First, once inlined, escape analysis might notice they don't escape
        Second, accesses to final fields of freshly instantiated objects can avoid one indirection level (provided the constructors is also inlined, here's where "bread-first serch" would help)

  (2.a) an InlineableGraph may also prepare a summary of the usages of parameters. 
        The idea is for such summary to record whether the target method uses one of its parameters as method-receiver or for field-accesses 
        In terms of closures, or the Command pattern in Java, such callees should be inlined to convert the accesses in question from mega to monomorphic.
        With the summary, a more informed inlining decision can be made.

Those are the ideas I've managed to come up with so far, one for each of (1) and (2). Comments?

Miguel

[1] http://hg.openjdk.java.net/graal/graal