More performance explorations
Charles Oliver Nutter
headius at headius.com
Sun Jun 5 07:23:53 PDT 2011
On Sun, Jun 5, 2011 at 6:18 AM, Charles Oliver Nutter
<headius at headius.com> wrote:
> That said...I have not recently re-attempted installing
> invokedynamic-based primitive call paths. I'll give it another shot
> this week and see where we stand. Testing a simple loop ought to show
> quickly the overhead of invokedynamic versus my call sites.
Never one to shrug off a challenge, I decided to throw together a
dirt-simple primitive invocation path for the Fixnum operations that
support it in JRuby. This includes the math operators (except "/"
which one library overrides), boolean operators, comparison operator,
and bitwise operators. My dirt simple version does not install any
guards; if the first invocation is against RubyFixnum it attempts to
hardwire the call site directly to the RubyFixnum method corresponding
to the operator ("op_plus" for "+", etc). Ignoring invokedynamic
overhead, it should be faster than my call site logic, since the
latter needs to look up call site (aload 0; getfield; aaload), invoke
through the call site (invokevirtual), check that fixnum has not been
modified (aload 1; getfield "runtime"; getfield
"fixnumHasBeenModified"; jne), check repeatedly that the incoming
object is a fixnum (instanceof + checkcast) and finally make the
invocation of the primitive-receiving method.
Unfortunately, the invokedynamic version is still slower.
Investigation (with a simple loop) seems to show that it doesn't
inline. Here's the relevant code from before (using JRuby's
specialized call sites) and after (using invokedynamic). You can see
the RubyFixnum.op_plus logic has inlined; I've cut it off roughly
where it starts to do overflow checking on the result. The
invokedynamic version does not inline, and does a callq to op_plus.
Note that there's two assembly dumps for my simple loop in
PrintAssembly output, but neither of them seem to show op_plus (or
op_lt, incidentally) getting inlined. Bug? Shouldn't a virtual DMH
bound to an invokedynamic call site through a handful of adapters
(permute + explicitCast in this case) be inlining?
https://gist.github.com/1008986
Here's the relevant code:
jruby -e "def loop; a = 0; while a < 10_000_000; a += 1; end; end;
10.times { loop }"
And the relevant addition to JRuby's indy support:
// TODO: guards
MethodHandle target = findVirtual(RubyFixnum.class,
fastOpsMethod, MethodType.methodType(IRubyObject.class,
ThreadContext.class, long.class));
target = MethodHandles.explicitCastArguments(target,
MethodType.methodType(IRubyObject.class, IRubyObject.class,
ThreadContext.class, long.class));
target = MethodHandles.permuteArguments(target,
MethodType.methodType(IRubyObject.class, ThreadContext.class,
IRubyObject.class, IRubyObject.class, String.class, long.class), new
int[] {2,0,4});
site.setTarget(target);
I'm pushing this logic (enable with -Xinvokedynamic.fastops=true), but
it will be disabled until indy (on Hotspot) is faster than JRuby's
built-in hacks (and I insert the appropriate guard logic!).
OH, and FWIW, here's the LogCompilation -i output roughly around where
I'd expect to see op_plus and op_lt inlining:
@ 27 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
@ 27 java.lang.invoke.MethodHandle::invokeExact (17 bytes)
@ 10 org.jruby.RubyFixnum::op_plus (38 bytes)
@ 45 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
@ 45 java.lang.invoke.MethodHandle::invokeExact (17 bytes)
@ 10 org.jruby.RubyFixnum::op_lt (22 bytes)
Is it lying, or what? And if it's actually inlining, where's the rest
of op_plus and op_lt, most of which is trivial tiny methods? And why
doesn't it show up as inlined in the assembly output?
- Charlie
More information about the mlvm-dev
mailing list