Studying LF performance

Sun Dec 23 20:26:50 PST 2012

Excellent! I'll give it a look and base my experiments on that!

- Charlie

On Sun, Dec 23, 2012 at 4:04 PM, Vladimir Kozlov
<vladimir.kozlov at oracle.com> wrote:
> Hi Charlie,
>
> If you want to experiment :) you can try the code Roland and Christian
> pushed.
>
> Roland just pushed Incremental inlining changes for C2 which should help
> LF inlining:
>
> http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/d092d1b31229
>
> You also need Christian's inlining related changes in JDK which :
>
> http://hg.openjdk.java.net/hsx/hotspot-main/jdk/rev/12fa4d7ecaf5
>
> Regards,
> Vladimir
>
> On 12/23/12 11:21 AM, Charles Oliver Nutter wrote:
>> A thread emerges!
>>
>> I'm going to be taking some time this holiday to explore the
>> performance of the new LF indy impl in various situations. This will
>> be the thread where I gather observations.
>>
>> A couple preliminaries...
>>
>> My perf exploration so far seems to show LF performing nearly
>> equivalent to the old impl for the smallest benchmarks, with
>> performance rapidly degrading as the size of the code involved grows.
>> Recursive fib and tak have nearly identical perf on LF and the old
>> impl. Red/black performs about the same on LF as with indy disabled,
>> well behind the old indy performance. At some point, LF falls
>> completely off the cliff and can't even compete with non-indy logic,
>> as in a benchmark I ran today of Ruby constant access (heavily
>> SwitchPoint-dependent).
>>
>> Discussions with Christian seem to indicate that the fall-off is
>> because non-inlined LF indy call sites perform very poorly compared to
>> the old impl. I'll be trying to explore this and correlate the perf
>> cliff with failure to inline. Christian has told me that (upcoming?)
>> work on incremental inlining will help reduce the performance impact
>> of the fall-off, but I'm not sure of the status of this work.
>>
>> Some early ASM output from a trivial benchmark: loop 500M times
>> calling #foo, which immediately calls #bar, which just returns the
>> self object (ALOAD 2; ARETURN in essence). I've been comparing the new
>> ASM to the old, both presented in a gist here:
>> https://gist.github.com/4365103
>>
>> As you can see, the code resulting from both impls boils down to
>> almost nothing, but there's one difference...
>>
>> New code not present in old:
>>
>> 0x0000000111ab27ef: je     0x0000000111ab2835  ;*ifnull
>>                                                  ; -
>> java.lang.Class::cast at 1 (line 3007)
>>                                                  ; -
>> java.lang.invoke.LambdaForm$MH/763053631::guard at 12
>>                                                  ; -
>> java.lang.invoke.LambdaForm$MH/518216626::linkToCallSite at 14
>>                                                  ; -
>> ruby.__dash_e__::method__0$RUBY$foo at 3 (line 1)
>>
>> A side effect of inlining through LFs, I presume? Checking to ensure
>> non-null call site? If so, shouldn't this have folded away, since the
>> call site is constant?
>>
>> In any case, it's hardly damning to have an extra branch. This output
>> is, at least, proof that LF *can* inline and optimize as well as the
>> old impl...so we can put that aside for now. The questions to explore
>> then are:
>>
>> * Do cases expected to inline actually do so under LF impl?
>> * When inlining, does code optimize as it should (across the various
>> shapes of call sites in JRuby, at least)?
>> * When code does not inline, how does it impact performance?
>>
>> My expectation is that cases which should inline do so under LF, but
>> that the non-inlined performance is significantly worse than under the
>> old impl. The critical bit will be ensuring that even when LF call
>> sites do not inline, they at least still compile to avoid
>> interpretation and LF-to-LF overhead. At a minimum, it seems like we
>> should be able to expect all LF between a call site and its DMH target
>> will get compiled into a single unit, if not inlined into the caller.
>> I still contend that call site + LFs should be heavily prioritized for
>> inlining either into the caller or along with the called method, since
>> they really *are* the shape of the call site. If there has to be a
>> callq somewhere in that chain, there should ideally be only one.
>>
>> So...here we go.
>>
>> - Charlie
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev