Performance problem with invoke dynamic

Wed Jul 27 17:20:39 PDT 2011

Hi,

I've hit a very annoying performance problem with invoke dynamic/method
handles that makes certain benchmarks about 3 times slower for identical
operations. This code is related to to variable lookup and the basic
idea is that I have a LexicalScope class which contains a parent
pointer. It has a LexicalScope.One subclass that extends LexicalScope, a
LexicalScope.Two that extends LexicalScope.One, etc, and there is a
field on each of them that contains that indexed variable.

At compile time, I know what lexical depth and index a variable maps to.
The original code generates straight bytecode for this. My benchmarks
(depending on depth and breadth of the lexical scope) goes between 2.1s
to 4.1s. The byte code just does this:
 get the current scope
 get the parent of the scope (by repeatedly getting the parent field)
 cast to the specific scope size we are interested in
 get the field for the index we are interested in
 do regular return/invocation on this value (this is the same process as
the other call paths, so should be fine).

However, when I try to do the same thing with MethodHandles, the best I
can get it to do is 8.1s to 15s, which is pretty terrible (it was even
worse before I stopped using methodhandles directly to fields.
MethodHandles to a getter method gave me 10%).

The actual method handle creation looks a bit like this:

        MethodHandle current = identity(LexicalScope.class);

        int currentDepth = lexicalDepth;
        while(currentDepth-- > 0) {
            current = filterArguments(current, 0, PARENT_SCOPE_METHOD);
        }

        MethodHandle valueMH = null;
        switch(lexicalIndex) {
        case 0:
            valueMH = filterArguments(SCOPE_0_GETTER_M, 0, current);
            break;
        case 1:
            valueMH = filterArguments(SCOPE_1_GETTER_M, 0, current);
            break;
        case 2:
            valueMH = filterArguments(SCOPE_2_GETTER_M, 0, current);
            break;
        case 3:
            valueMH = filterArguments(SCOPE_3_GETTER_M, 0, current);
            break;
        case 4:
            valueMH = filterArguments(SCOPE_4_GETTER_M, 0, current);
            break;
        case 5:
            valueMH = filterArguments(SCOPE_5_GETTER_M, 0, current);
            break;
        default:
            valueMH = filterArguments(insertArguments(SCOPE_N_GETTER_M,
0, lexicalIndex-6), 0, current);
            break;
        }

The rest just applies the same method handles for invocation/return as
the rest of the call site is using.
SCOPE_2_GETTER_M is defined as
  findVirtual(LexicalScope.Three.class, "getValueThree",
       methodType(SephObject.class)).asType(SCOPE_GETTER_M_TYPE)
where getValueThree is just a final getter method.

I tried switching out asType to explicitCastArguments. That ended up
being about 5% slower. I tried removing the asType by defining all the
methods on LexicalScope and overriding them (which in practice would
never call the base method). This didn't give any performance change at all.

So now I'm a bit lost - I have no idea why this is so much slower than
the explicit bytecode. Any thoughts? My next attack will be to go and
compare the assembler.

Cheers
-- 
 Ola Bini (http://olabini.com)
  Ioke - JRuby - ThoughtWorks

 "Yields falsehood when quined" yields falsehood when quined.