MutableCallSite + constant handle slower than field accesses?
Charles Oliver Nutter
headius at headius.com
Sat Oct 15 05:56:15 PDT 2011
I'm seeing something peculiar and wanted to run it by you folks.
There are a few values that JRuby's compiler had previously been
loading from instance fields every time they're needed. Specifically,
fields like ThreadContext.runtime (the current JRuby runtime),
Ruby.falseObject, Ruby.trueObject, Ruby.nilObject (false, true, and
nil values). I figured I'd make a quick change today and have those
instead be constant method handles bound into a mutable call site.
Unfortunately, performance seems to be worse.
The logic works like this:
* ThreadContext is loaded to stack
* invokedynamic, bootstrap just wires up an initialization method into
a MutableCallSite
* initialization method rebinds call site forever to a constant method
handle pointing at the value (runtime/true/false/nil objects)
My expectation was that this would be at least no slower (and
potentially a tiny bit faster) but also less bytecode (in the case of
true/false/nil, it was previously doing
ThreadContext.runtime.getNil()/getTrue()/getFalse()). It seems like
it's actually slower than walking those references, though, and I'm
not sure why.
Here's a couple of the scenarios in diff form showing bytecode before
and bytecode after:
Loading "runtime"
ALOAD 1
- GETFIELD org/jruby/runtime/ThreadContext.runtime : Lorg/jruby/Ruby;
+ INVOKEDYNAMIC getRuntime
(Lorg/jruby/runtime/ThreadContext;)Lorg/jruby/Ruby;
[org/jruby/runtime/invokedynamic/InvokeDynamicSupport.getObjectBootstrap(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/St
ring;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite; (6)]
Loading "false"
ALOAD 1
- GETFIELD org/jruby/runtime/ThreadContext.runtime : Lorg/jruby/Ruby;
- INVOKEVIRTUAL org/jruby/Ruby.getFalse ()Lorg/jruby/RubyBoolean;
+ INVOKEDYNAMIC getFalse
(Lorg/jruby/runtime/ThreadContext;)Lorg/jruby/RubyBoolean;
[org/jruby/runtime/invokedynamic/InvokeDynamicSupport.getObjectBootstrap(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;
(6)]
I think because these are now seen as invocations, I'm hitting some
inlining budget limit I didn't hit before (and which isn't being
properly discounted). The benchmark I'm seeing degrade is
bench/language/bench_flip.rb, and it's a pretty significant
degradation. Only the "heap" version shows the degradation, and it
definitely does have more bytecode...but the bytecode with my patch
differs only in the way these values are being accessed, as shown in
the diffs above.
Before:
user system
total real
1m x10 while (a)..(!a) (heap) 0.951000 0.000000
0.951000 ( 0.910000)
user system
total real
1m x10 while (a)..(!a) (heap) 0.705000 0.000000
0.705000 ( 0.705000)
user system
total real
1m x10 while (a)..(!a) (heap) 0.688000 0.000000
0.688000 ( 0.688000)
user system
total real
After:
user system
total real
1m x10 while (a)..(!a) (heap) 2.350000 0.000000
2.350000 ( 2.284000)
user system
total real
1m x10 while (a)..(!a) (heap) 2.128000 0.000000
2.128000 ( 2.128000)
user system
total real
1m x10 while (a)..(!a) (heap) 2.115000 0.000000
2.115000 ( 2.116000)
user system
total real
You can see the degradation is pretty bad.
I'm concerned because I had hoped that invokedynamic + mutable call
site + constant handle would always be faster than a field
access...since it avoids excessive field accesses and makes it
possible for Hotspot to fold those constants away. What's going on
here?
Patch for the change (apply to JRuby master) is here:
https://gist.github.com/955976b52b0c4e3f611e
- Charlie
More information about the mlvm-dev
mailing list