More performance explorations

Charles Oliver Nutter headius at headius.com
Sun Jun 5 07:23:53 PDT 2011


On Sun, Jun 5, 2011 at 6:18 AM, Charles Oliver Nutter
<headius at headius.com> wrote:
> That said...I have not recently re-attempted installing
> invokedynamic-based primitive call paths. I'll give it another shot
> this week and see where we stand. Testing a simple loop ought to show
> quickly the overhead of invokedynamic versus my call sites.

Never one to shrug off a challenge, I decided to throw together a
dirt-simple primitive invocation path for the Fixnum operations that
support it in JRuby. This includes the math operators (except "/"
which one library overrides), boolean operators, comparison operator,
and bitwise operators. My dirt simple version does not install any
guards; if the first invocation is against RubyFixnum it attempts to
hardwire the call site directly to the RubyFixnum method corresponding
to the operator ("op_plus" for "+", etc). Ignoring invokedynamic
overhead, it should be faster than my call site logic, since the
latter needs to look up call site (aload 0; getfield; aaload), invoke
through the call site (invokevirtual), check that fixnum has not been
modified (aload 1; getfield "runtime"; getfield
"fixnumHasBeenModified"; jne), check repeatedly that the incoming
object is a fixnum (instanceof + checkcast) and finally make the
invocation of the primitive-receiving method.

Unfortunately, the invokedynamic version is still slower.

Investigation (with a simple loop) seems to show that it doesn't
inline. Here's the relevant code from before (using JRuby's
specialized call sites) and after (using invokedynamic). You can see
the RubyFixnum.op_plus logic has inlined; I've cut it off roughly
where it starts to do overflow checking on the result. The
invokedynamic version does not inline, and does a callq to op_plus.

Note that there's two assembly dumps for my simple loop in
PrintAssembly output, but neither of them seem to show op_plus (or
op_lt, incidentally) getting inlined. Bug? Shouldn't a virtual DMH
bound to an invokedynamic call site through a handful of adapters
(permute + explicitCast in this case) be inlining?

https://gist.github.com/1008986

Here's the relevant code:

jruby -e "def loop; a = 0; while a < 10_000_000; a += 1; end; end;
10.times { loop }"

And the relevant addition to JRuby's indy support:

            // TODO: guards
            MethodHandle target = findVirtual(RubyFixnum.class,
fastOpsMethod, MethodType.methodType(IRubyObject.class,
ThreadContext.class, long.class));
            target = MethodHandles.explicitCastArguments(target,
MethodType.methodType(IRubyObject.class, IRubyObject.class,
ThreadContext.class, long.class));
            target = MethodHandles.permuteArguments(target,
MethodType.methodType(IRubyObject.class, ThreadContext.class,
IRubyObject.class, IRubyObject.class, String.class, long.class), new
int[] {2,0,4});

            site.setTarget(target);

I'm pushing this logic (enable with -Xinvokedynamic.fastops=true), but
it will be disabled until indy (on Hotspot) is faster than JRuby's
built-in hacks (and I insert the appropriate guard logic!).

OH, and FWIW, here's the LogCompilation -i output roughly around where
I'd expect to see op_plus and op_lt inlining:

    @ 27 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
    @ 27 java.lang.invoke.MethodHandle::invokeExact (17 bytes)
      @ 10 org.jruby.RubyFixnum::op_plus (38 bytes)
    @ 45 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
    @ 45 java.lang.invoke.MethodHandle::invokeExact (17 bytes)
      @ 10 org.jruby.RubyFixnum::op_lt (22 bytes)

Is it lying, or what? And if it's actually inlining, where's the rest
of op_plus and op_lt, most of which is trivial tiny methods? And why
doesn't it show up as inlined in the assembly output?

- Charlie


More information about the mlvm-dev mailing list