Good news, bad news

Mon May 23 15:06:07 PDT 2011

FWIW, perf with indy versus monomorphic inline caching on that
bench_method_dispatch_only benchmark:

~/projects/jruby ➔ jruby --server -X+C
bench/language/bench_method_dispatch_only.rbTest ruby method: 1000k
loops calling self's foo 10 times
  1.129000   0.000000   1.129000 (  0.662000)
  0.409000   0.000000   0.409000 (  0.409000)
  0.455000   0.000000   0.455000 (  0.455000)
  0.428000   0.000000   0.428000 (  0.428000)
  0.474000   0.000000   0.474000 (  0.474000)
  0.470000   0.000000   0.470000 (  0.470000)
  0.458000   0.000000   0.458000 (  0.458000)
  0.495000   0.000000   0.495000 (  0.495000)
  0.460000   0.000000   0.460000 (  0.460000)
  0.508000   0.000000   0.508000 (  0.508000)

~/projects/jruby ➔ jruby --server -Xcompile.invokedynamic=false -X+C
bench/language/bench_method_dispatch_only.rb
Test ruby method: 1000k loops calling self's foo 10 times
  0.377000   0.000000   0.377000 (  0.315000)
  0.211000   0.000000   0.211000 (  0.207000)
  0.132000   0.000000   0.132000 (  0.132000)
  0.128000   0.000000   0.128000 (  0.128000)
  0.135000   0.000000   0.135000 (  0.135000)
  0.140000   0.000000   0.140000 (  0.140000)
  0.122000   0.000000   0.122000 (  0.122000)
  0.122000   0.000000   0.122000 (  0.122000)
  0.122000   0.000000   0.122000 (  0.122000)
  0.122000   0.000000   0.122000 (  0.122000)

Previously, invokedynamic version clocked in *much* faster than the
MIC version...like an order of magnitude faster.

- Charlie

On Mon, May 23, 2011 at 4:56 PM, Charles Oliver Nutter
<headius at headius.com> wrote:
> Another example, running bench/language/bench_method_dispatch_only,
> which runs a 1m iteration loop that invokes an empty "foo" method five
> times:
>
> https://gist.github.com/9008f94fc677f3fe98e7
>
> Note again that it seems like only the test logic and maybe some of
> the logic wrapping the foo call inline...the foo calls themselves do
> not appear in logc inlining graph at all.
>
> - Charlie
>
> On Mon, May 23, 2011 at 4:50 PM, Charles Oliver Nutter
> <headius at headius.com> wrote:
>> Also, fwiw...after these two chunks in LogCompilation output, I see
>> nothing else inlined into fib_ruby, including a monomorphic call path
>> through PlusCallSite ending at RubyFixnum#op_plus (the integer +
>> operation). That would also affect performance.
>>
>> I also do not see any indication *why* nothing inlines past this
>> point. Usually it would say "too big" or something.
>>
>> I do see MinusCallSite inline earlier.
>>
>> - Charlie
>>
>> On Mon, May 23, 2011 at 4:47 PM, Charles Oliver Nutter
>> <headius at headius.com> wrote:
>>> The following chunk should be the invokedynamic call to fib, via a
>>> GWT, an arg permuter, and perhaps one convert:
>>>
>>>    @ 77 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
>>>    @ 77 java.lang.invoke.MethodHandle::invokeExact (44 bytes)
>>>      @ 8 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
>>>      @ 8 java.lang.invoke.MethodHandle::invokeExact (7 bytes)
>>>        @ 3 org.jruby.runtime.invokedynamic.InvokeDynamicSupport::test
>>> (20 bytes)
>>>          @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
>>>          @ 8 org.jruby.RubyModule::getCacheToken (5 bytes)
>>>      @ 23 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
>>>      @ 23 java.lang.invoke.MethodHandle::invokeExact (67 bytes)
>>>        @ 1 java.lang.Boolean::valueOf (14 bytes)
>>>        @ 10 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
>>>        @ 10 java.lang.invoke.MethodHandle::invokeExact (24 bytes)
>>>          @ 11 java.lang.Boolean::booleanValue (5 bytes)
>>>          @ 20 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes)
>>>        @ 63 java.lang.invoke.MethodHandle::invokeExact (0 bytes)
>>>      @ 37 sun.invoke.util.ValueConversions::identity (2 bytes)
>>>
>>> This seems to only be the test logic; the actual fib invocation
>>> doesn't appear to show up in the inlining graph at all. Am I right?
>>>
>>> I see two of these in the LogCompilation output and nothing else
>>> around them. I'd expect to see them do the invocation of fib_ruby
>>> somewhere in there. It's like the "success" branch of GWT is not even
>>> being considered for inlining.
>>>
>>> - Charlie
>>>
>>> On Mon, May 23, 2011 at 4:41 PM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
>>>> If there were to be a recursive inline in there, where would it occur?  I can't tell from the names where in that inline tree where the recursive call occurs.
>>>>
>>>> tom
>>>>
>>>> On May 23, 2011, at 2:26 PM, Charles Oliver Nutter wrote:
>>>>
>>>>> fib_ruby LogCompilation inlining graph, showing that fib_ruby is not
>>>>> inlined: https://gist.github.com/f2b665ad3c97ba622ebf
>>>>>
>>>>> Can anyone suggest other flags I can try to adjust to get things to
>>>>> inline better?
>>>>>
>>>>> FWIW, the handle chain in question that's not inlining is pretty simple:
>>>>>
>>>>> * DMH pointing back at fib_ruby
>>>>> * permute args
>>>>> * GWT
>>>>>
>>>>> - Charlie
>>>>>
>>>>> On Mon, May 23, 2011 at 4:19 PM, Charles Oliver Nutter
>>>>> <headius at headius.com> wrote:
>>>>>> I'm working up a set of files that show JRuby compilation output, but
>>>>>> I noticed a couple things that might be interesting right now.
>>>>>>
>>>>>> First off, fairly early in the assembly output for fib, I see this:
>>>>>>
>>>>>>  0x02876d1f: call      0x0282d0e0      ; OopMap{[96]=Oop [100]=Oop
>>>>>> [28]=Oop [40]=Oop [48]=Oop off=644}
>>>>>>                                        ;*invokespecial invokeExact
>>>>>>                                        ; -
>>>>>> java.lang.invoke.MethodHandle::invokeExact at 63
>>>>>>                                        ; -
>>>>>> java.lang.invoke.MethodHandle::invokeExact at 23
>>>>>>                                        ; -
>>>>>> bench.bench_fib_recursive::method__0$RUBY$fib_ruby at 51 (line 7)
>>>>>>                                        ;   {optimized virtual_call}
>>>>>>
>>>>>> For fib, the only invokedynamic is the recursive call to fib, so that
>>>>>> would indicate that fib_ruby is not inlining into itself at all here.
>>>>>> And I can't see it inlining into itself anywhere in the assembly
>>>>>> output.
>>>>>>
>>>>>> Later in the same output:
>>>>>>
>>>>>>  0x0287703f: call      0x0282dba0      ; OopMap{ebp=Oop off=1444}
>>>>>>                                        ;*checkcast
>>>>>>                                        ; -
>>>>>> java.lang.invoke.MethodHandle::invokeExact at 40
>>>>>>                                        ; -
>>>>>> bench.bench_fib_recursive::method__0$RUBY$fib_ruby at 82 (line 7)
>>>>>>                                        ;   {runtime_call}
>>>>>>  0x02877044: call      0x0105a9d0      ;*checkcast
>>>>>>                                        ; -
>>>>>> java.lang.invoke.MethodHandle::invokeExact at 40
>>>>>>                                        ; -
>>>>>> bench.bench_fib_recursive::method__0$RUBY$fib_ruby at 82 (line 7)
>>>>>>                                        ;   {runtime_call}
>>>>>>
>>>>>> These appear repeatedly near the invokedynamic invocation above. If
>>>>>> I'm reading this right, neither the recursive call nor logic involved
>>>>>> in that particular handle is inlining. Am I right?
>>>>>>
>>>>>> Here's the complete assembly dump (i386) for the fib_ruby method:
>>>>>> https://gist.github.com/987640
>>>>>>
>>>>>> In other news, MaxInlineSize=150 with InlineSmallCode=3000 does not
>>>>>> appear to improve performance. I also tried bumping up
>>>>>> MaxRecursiveInlineLevel and MaxInlineLevel with no effect.
>>>>>>
>>>>>> - Charlie
>>>>>>
>>>>> _______________________________________________
>>>>> mlvm-dev mailing list
>>>>> mlvm-dev at openjdk.java.net
>>>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>>>
>>>> _______________________________________________
>>>> mlvm-dev mailing list
>>>> mlvm-dev at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>>>
>>>
>>
>