Lazy Method Handle update

Wed May 9 20:30:59 PDT 2012

Thanks for the update, John! Comments below...

On Wed, May 9, 2012 at 2:34 PM, John Rose <john.r.rose at oracle.com> wrote:
> In JDK 7 FCS a method handle is represented as a chain of argument transformation blocks, ending in a pointer to a methodOop.  The argument transformations are assembly coded and work in the interpreter stack.  The reason this is not outrageously slow is that we vigorously inline method handle calls whenever we can.  But there is a performance cliff you can drop off of, when you are working with non-constant MHs.  (BTW, invokedynamic almost always inlines its target.)  Project Lambda needs us not to drop off of this cliff.

And I need you to not drop off that cliff too! It's very easy to
trigger...just make a method big enough, and AAAAAAAARRGH into the pit
you go.

Luckily, for the ambitious early-access JRuby users running JRuby
master + Java 7u2+ in production, the code they're hitting is all
small enough to avoid the cliff, but with JRuby 1.7 preview release
coming out in a couple weeks more people are going to start trying
things out.

> To fix this, we are now representing the argument transformations using a simple AST-like IR, called a LambdaForm.  This form can be easily rendered down to bytecodes.  (Eventually it maybe directly rendered to native code.)  The form is *also* interpretable by a Java-coded AST walker.  This allows the system to be lazy, and to work hardest on optimizing those method handles that are actually called frequently.  The laziness also helps simplify bootstrapping.  The remaining assembly code is much smaller, and can be mirrored in the JIT IR and optimized.

It also creates some *epic* stack traces when it blows up. Will those
fold away in the future?

> Here's an update on where we are.  Christian Thalinger, Michel Haupt, and I are currently working on the following tasks:
>
> A. clean out the compiled method calling path, for non-constant method handles
> B. flatten the BMH layout (no boxing, linked lists, or arrays)
> C. make the handling of MethodType checking visible to the compiler (removing more assembly code)
> D. tuning reuse and compilation of LambdaForm instances
> E. profiling MH.LambdaForm values at MH call sites
> F. tuning optimization of call sites involving LFs

I have been tossing numbers and benchmarks back and forth with
Christian, and now testing a local build of the meth-lazy stuff
myself. Numbers haven't been great, but I think Christian made great
progress today (based on an email showing C1 + indy beating C1 without
indy and drastically beating C1 + indy in a stock u6 build that falls
off the cliff). It's very exciting!

> For A. the remaining snag is getting the argument register assignments correct for the call to the target method.  There is also an issue with representing non-nominal calls in the backend.

I assume this is the problem Christian described to me, where it was
calling back into the interpreter to fix up the arguments?

> For B. we are currently working on bootstrap issues.  The idea here is that, while we can do escape analysis, etc., a cleaner data structure will make the compiler succeed more often.

I will be *thrilled* when EA works across indy call sites. We have
started work on our new compiler, which uses a simpler intermediate
representation and which will be indy-only from day 1. Already we're
seeing gains since we don't have to hand-write all the different call
paths we want to represent; we can wire up any combinations of
arguments, handles, and target using only method handles. That means
we do things that will be ripe for EA like:

* Allocating heap storage for closures right next to the closure creation
* Passing closures as a handle rather than as an opaque, polymorphic structure
* Specializing closure-receiving code in *our* compiler until Hotspot
can specialize it for us

I'd be very surprised if we can't approach Java performance for the
*general* cases of Ruby code by end of year, and if we can specialize
closure-receiving code *and* get EA, we might be able to compete with
Java 8 lambda performance for Ruby's closures too.

We also have our own profiling, inlining, and so on...but that's all
above the level of bytecode to work around as-yet-unoptimized patterns
in Hotspot. :)

> For C. we have a refactoring in process for moving the MT value out of the methodOop.
>
> Chris, Michael, and I are working on A, B, C, respectively.  We think a first cut of lazy MHs needs the first three items in order to be reasonably faster than the all-assembly implementation of JDK 7.
>
> In order to address the infamous NoClassDefFound error, we are minimizing nominal information in MH adapter code (LambdaForms and their bytecode).  Only names on the BCP will be in adapter code.   Part C. is an important part of this, since it allows the system to internally "flatten" calls like MH.invokeExact((MyFunnyType)x) to MH.invokeExact((Object)x).  The new internal MH methods (invokeBasic, invokeStatic, etc.) all use "denominalized" types, which is to say that all reference types are represented as java.lang.Object.

I have not been able to stump Chris with any NCDFEs lately, so that's
good. But I do have some hacks in place to prevent them I can't remove
until the new logic solidifies a bit.

Now that the logic has started to land, I'm going to do some
benchmarking and assembly-reading of my own to help from my end. And
hopefully there's a chance I'll be able to help more directly over the
summer.

Very exciting stuff...I'm thrilled that dynlangs and indy are being
taken so seriously. I told a couple thousand people at JAX 2012 how
strongly I believe that indy is the most important work happening on
the JVM right now, and I'm looking forward to doing more and more with
it :)

- Charlie