Tail Call Optimization

Thu Jan 15 21:29:55 UTC 2015

Wow. Thanks so much for the help. I'm seeing similar drops in execution
time. Based on your commits, it looks like the main problem was I was
reusing frame descriptors between call targets. Good to know for the
future. I appreciate taking the time to look at my code.

Thanks again,
Cristian

On Thu, Jan 15, 2015 at 5:13 AM, Christian Humer <christian.humer at gmail.com>
wrote:

> Hi Cristian,
>
> I did have a closer look at your code and I spotted several problems that
> prevented your code to be optimized properly. I prepared a pull request
> that should fix those problems.
>
> https://github.com/cesquivias/mumbler/pull/6
>
> Be aware of invalidations if you enable -G:+TraceTruffleCompilation. If
> there are some you should review them for problems.
>
> The inline failed message is valid as you are not using a tail call for
> the inner lamda in countdown.
>  ((lambda (x) (countdown 100) (- x 100))
>
> Here are my benchmark results for the countdown benchmark:
>
> Before my changes:
> ('computation-time 204)
>
> After my changes:
> ('computation-time 0)
>
> Have fun!
>
> - Christian Humer
>
>
> On Thu Jan 15 2015 at 02:07:04 Cristian Esquivias <
> cristian.esquivias at gmail.com> wrote:
>
>> It looks like the inlining failed because of recursion.
>>
>> [truffle] inline failed        [(define n ReadArgumentNode at 52525845),
>> IfNode at 3b94d659] |ASTSize      41/   41 |frequency     99.4917
>> |score          2.4266 |index=  0, force=N, callSites=221   |reason
>> recursion
>>
>> I last pulled/built yesterday so the vm should be up to date.
>>
>> On Wed, Jan 14, 2015 at 3:55 PM, Christian Humer <
>> christian.humer at gmail.com> wrote:
>>
>>> Your code looks good at first sight. Can you verify that all your call
>>> nodes get inlined properly with -G:+TraceTruffleInlining?
>>> If so there should be no performance hit but a big improvement, as the
>>> whole function should be transformed into a while loop instead of recursive
>>> function calls.
>>> That said, if Truffle does not see the exception thrown and catched in
>>> the same compilation scope there is some price to pay for tail calls atm,
>>> yes.
>>>
>>>
>>> On Wed Jan 14 2015 at 23:50:08 Cristian Esquivias <
>>> cristian.esquivias at gmail.com> wrote:
>>>
>>>> Thanks for the help Stefan & Christian. I switched over to passing
>>>> CallTarget objects and following the SimpleLanguage example for an inline
>>>> cache of direct dispatch node. No more opt failures.
>>>>
>>>>
>>>> https://github.com/cesquivias/mumbler/commit/ba4986d33d983b17eaa19892e4a37fb3d42a212c
>>>>
>>>> Unfortunately, even though the direct dispatch node is called,
>>>> performance has taken a big hit. The median time on my microbenchmark
>>>> increased over 50% (360ms to 550ms). Is throwing a flow exception out of
>>>> function frames not very fast?
>>>>
>>>> Microbenchmark:
>>>> https://github.com/cesquivias/mumbler/commit/616fcdea94d50ec8a121ac4da2eb3eef9d5092be
>>>>
>>>> On Tue, Jan 13, 2015 at 4:31 PM, Christian Humer <
>>>> christian.humer at gmail.com> wrote:
>>>>
>>>>> Hi Christian,
>>>>>
>>>>> You should not pass nodes around like values (like you do with the
>>>>> DirectCallNode). Instead let the TailCallException carry a CallTarget.
>>>>>
>>>>> As Stefan pointed out, in the catch block of the TailCallException you
>>>>> are going to need another inline cache similar to the one in
>>>>> SLDirectDispatchNode. To try it out quickly you can also use an
>>>>> IndirectCallNode instead, but don't expect perfect performance from this.
>>>>>
>>>>> - Christian Humer
>>>>>
>>>>>
>>>>>
>>>>> On Wed Jan 14 2015 at 00:18:13 Cristian Esquivias <
>>>>> cristian.esquivias at gmail.com> wrote:
>>>>>
>>>>>> Hi Stefan,
>>>>>>
>>>>>> Thanks for taking a look. So I would create a linked list of dispatch
>>>>>> nodes
>>>>>> like in Simple Language?
>>>>>>
>>>>>> http://hg.openjdk.java.net/graal/graal/file/3a59e1411192/gra
>>>>>> al/com.oracle.truffle.sl/src/com/oracle/truffle/sl/nodes/call/
>>>>>> SLDirectDispatchNode.java
>>>>>>
>>>>>> Since the function object and the call target don't change, how
>>>>>> should I
>>>>>> test for equality between one dispatch node and another? That is, how
>>>>>> do I
>>>>>> know to use one dispatch node versus another?
>>>>>>
>>>>>> Thanks again,
>>>>>> Cristian
>>>>>>
>>>>>> On Tue, Jan 13, 2015 at 3:06 PM, Stefan Marr <java at stefan-marr.de>
>>>>>> wrote:
>>>>>>
>>>>>> > Hi Cristian:
>>>>>> >
>>>>>> > > On 13 Jan 2015, at 23:25, Cristian Esquivias <
>>>>>> > cristian.esquivias at gmail.com> wrote:
>>>>>> > >
>>>>>> > > I'm trying to add tail call optimization to my test language and
>>>>>> have a
>>>>>> > > functional version (no stackoverflows).
>>>>>> > >
>>>>>> > >
>>>>>> > https://github.com/cesquivias/mumbler/commit/3b61f7ccdad7429
>>>>>> d2feac6068b0f57edce6a4151
>>>>>> >
>>>>>> > Looking at the diff/code, I think the issue is the call node that
>>>>>> you pass
>>>>>> > back in the exception.
>>>>>> > At the point where you handle the exception, you’ll probably need
>>>>>> to make
>>>>>> > that node a constant as part of the AST, by using a dispatch node
>>>>>> chain/an
>>>>>> > inline cache.
>>>>>> >
>>>>>> > Best regards
>>>>>> > Stefan
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Stefan Marr
>>>>>> > INRIA Lille - Nord Europe
>>>>>> > http://stefan-marr.de/research/
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>
>>