Lambda Forms, the sequel

Thu Jul 19 14:34:20 PDT 2012

The new Lambda Form framework is about to be integrated.  This a key foundation stone for optimizing dynamic languages.  On this foundation, we want to build a robustly performant and portable implementation of JSR 292, to support our current and future set of great JVM languages, and to provide a flexible framework for code management.

Below is a brain dump of potential work items related to this framework.

None of the items below should delay the integration; some will take years.

— John

JVM

The role of the JVM will narrow to high-leverage optimizations, since LFs are primarily a Java-level mechanism.  But the following tasks are relevant:

* Bug fixing, performance improvement, customer benchmarks.  (Forever.)

* Help other groups cope with the changes.  The C++ interpreter needs adjustment.  The upcoming permgen removal work interacts with the LF changes, especially moving f1_as_instance to resolved_references.

* Fix C1 compilation of unlinked indy/invh sites.  (Must patch so as to incorporate appendix value, if any, plus force linkage on first call.)

* Tune inlining.  Lift restrictions on C2 node count and/or refine C2 graph size metric.  Refine "InlineSmallCode" metric and heuristic.  Stop inlining along cold LF paths.

* Propagate base-address casting information to field accessors, so that Unsafe.getObject (etc.) routinely constant-folds to regular field accesses.  This probably involves looking at the MH.type field of constant-folded MHs (when called with invokeBasic) to apply type assertions derived uniquely from MH.type (and not in the polymorphic LF of the MH).

* Make sure BMHs are escape-analyzable, etc.  Do whatever it takes so that MH.bindTo(x).invokeExact(y) will JIT to the same code as MH.invokeExact(x, y).

* Think about alternative ways to rendering LFs to executable code, such as feeding them directly to compiler front-ends.  (Bytecodes are likely to be the right answer, still, but maybe there is an interesting play with Graal.)

* And, the same points for Java closures as for method handles, whenever Java closures build on LFs.

JDK

Much of the interesting performance work will be at the JDK level, in tuning the mixed mode execution of the Lambda Form IR.

* Update meth-lazy.txt; communicate the stuff we've done so far.

* Extend the LambdaForm framework to implement the MethodHandleProxies stuff.  (What's there now is slow.)  Eventually consider reworking proxies and reflection using LFs.

* Rework the generic (inexact, type-repairing) invoke path.  What we have is pretty bad, and the new LF framework gives many options for making it good.

* Reexamine filter, collect, spread, and varargs combinators.  Make sure they use polymorphic building blocks (instead of building single-use LFs).  And that they are optimizable.

* Ditto for type-conversions (asType, explicitCastArguments).

* Reexamine remaining code in MethodHandleImpl.  Try to empty out that file.

* Get rid of single-use LFs and/or figure out rules for introducing single-use LFs only when needed (if ever).  Consider LF interning to get emergent (as opposed to intentionally cached) LF reuse. Evaluate DUMP_CLASS_FILES output to find repeated (non-reused) LFs.  Aim at O(1) and small compiled LF count.

* After LF count reduction, support static compilation environments, where the total LF set needs to be defined at program assembly time.  This may require introduction of more strongly polymorphic building blocks, such as a linkToStatic that can push arguments from a varargs array.

* Examine the binding logic for invokedynamic (MHN.linkCallSite) and arrange more efficient bindings when possible.  I.e., in the case of a ConstantCallSite, return an invoker for the CCS.target and push the target itself into the appendix (discarding the useless CCS).  After that, if the CCS.target is directly invocable (a DMH, say), bind it directly (w/o invoker) to the invokedynamic.  Maybe, if the CCS.target is a f(..., x) closure where the final argument x is bound, consider binding f directly to the instruction and let x be the appendix.  The JVM shouldn't care what you do.

* GWT inlining needs to be done asymmetrically, and/or in a way that allows the JVM to compile asymmetrically.

* GWT inlining probably goes along with a tweak to LFs to allow conditional early exit.  Probably add "isReturnValue" bit to Name; add "condition" property to Name.  Result should allow decision trees to be built efficiently out of small trees of LFs (one LF per super-block, = single entry, multiple exit).  Retire selectAlternative.

* Consider tactics for selectively instantiating (splitting) polymorphic LFs at the LF level (before conversion to bytecodes and JIT compilation).  For example, DMH.internalMemberName pulls out a MemberName from a DMH.  This is non-constant, enabling LF reuse.  But a hot DMH might want to customize its LF (can do this on the fly), constant-folding the MemberName in the LF representation.  Bytecode splitting is one of the strengths of JVM optimizers; consider using this tactic at the LF level.

* Make the LF framework use more static typing, to fit better with what the JVM verifier thinks about LFs.
0. Keep them fully polymorphic as at present.  Exploit invokeBasic polymorphism to reuse code.
1. Declare type bounds for LambdaForm formals (name[i] : i < arity).  Mainly, this means names[0] is "known" to be MethodHandle (or RunnableClosure or whatever).
2. The type bounds can be checked on entry to LF.invokeWithArguments, optionally.  Type bounds should be stored as a MethodType field in the LF, not on the Names.
3. If the return type of the LF is narrowed, include a cast at the end of the execution.
4. Inside the LF, require that the argument and return types of the Names "fit together".  This doesn't really need extra checking, although maybe NF.invokeWA can include argument casting.
5. In the bytecode generator, generate code that respects the JVM verifier constraints about the types of LF formals and Name arguments and return values.  Insert casts as needed, but they shouldn't be needed often.