Tail Call Optimization

Thu Jan 15 21:55:49 UTC 2015

0 / 204 looks indeed like an infinite speedup factor ;). - thomas

> On 15 Jan 2015, at 22:29, Cristian Esquivias <cristian.esquivias at gmail.com> wrote:
> 
> Wow. Thanks so much for the help. I'm seeing similar drops in execution
> time. Based on your commits, it looks like the main problem was I was
> reusing frame descriptors between call targets. Good to know for the
> future. I appreciate taking the time to look at my code.
> 
> Thanks again,
> Cristian
> 
> On Thu, Jan 15, 2015 at 5:13 AM, Christian Humer <christian.humer at gmail.com>
> wrote:
> 
>> Hi Cristian,
>> 
>> I did have a closer look at your code and I spotted several problems that
>> prevented your code to be optimized properly. I prepared a pull request
>> that should fix those problems.
>> 
>> https://github.com/cesquivias/mumbler/pull/6
>> 
>> Be aware of invalidations if you enable -G:+TraceTruffleCompilation. If
>> there are some you should review them for problems.
>> 
>> The inline failed message is valid as you are not using a tail call for
>> the inner lamda in countdown.
>> ((lambda (x) (countdown 100) (- x 100))
>> 
>> Here are my benchmark results for the countdown benchmark:
>> 
>> Before my changes:
>> ('computation-time 204)
>> 
>> After my changes:
>> ('computation-time 0)
>> 
>> Have fun!
>> 
>> - Christian Humer
>> 
>> 
>> On Thu Jan 15 2015 at 02:07:04 Cristian Esquivias <
>> cristian.esquivias at gmail.com> wrote:
>> 
>>> It looks like the inlining failed because of recursion.
>>> 
>>> [truffle] inline failed        [(define n ReadArgumentNode at 52525845),
>>> IfNode at 3b94d659] |ASTSize      41/   41 |frequency     99.4917
>>> |score          2.4266 |index=  0, force=N, callSites=221   |reason
>>> recursion
>>> 
>>> I last pulled/built yesterday so the vm should be up to date.
>>> 
>>> On Wed, Jan 14, 2015 at 3:55 PM, Christian Humer <
>>> christian.humer at gmail.com> wrote:
>>> 
>>>> Your code looks good at first sight. Can you verify that all your call
>>>> nodes get inlined properly with -G:+TraceTruffleInlining?
>>>> If so there should be no performance hit but a big improvement, as the
>>>> whole function should be transformed into a while loop instead of recursive
>>>> function calls.
>>>> That said, if Truffle does not see the exception thrown and catched in
>>>> the same compilation scope there is some price to pay for tail calls atm,
>>>> yes.
>>>> 
>>>> 
>>>> On Wed Jan 14 2015 at 23:50:08 Cristian Esquivias <
>>>> cristian.esquivias at gmail.com> wrote:
>>>> 
>>>>> Thanks for the help Stefan & Christian. I switched over to passing
>>>>> CallTarget objects and following the SimpleLanguage example for an inline
>>>>> cache of direct dispatch node. No more opt failures.
>>>>> 
>>>>> 
>>>>> https://github.com/cesquivias/mumbler/commit/ba4986d33d983b17eaa19892e4a37fb3d42a212c
>>>>> 
>>>>> Unfortunately, even though the direct dispatch node is called,
>>>>> performance has taken a big hit. The median time on my microbenchmark
>>>>> increased over 50% (360ms to 550ms). Is throwing a flow exception out of
>>>>> function frames not very fast?
>>>>> 
>>>>> Microbenchmark:
>>>>> https://github.com/cesquivias/mumbler/commit/616fcdea94d50ec8a121ac4da2eb3eef9d5092be
>>>>> 
>>>>> On Tue, Jan 13, 2015 at 4:31 PM, Christian Humer <
>>>>> christian.humer at gmail.com> wrote:
>>>>> 
>>>>>> Hi Christian,
>>>>>> 
>>>>>> You should not pass nodes around like values (like you do with the
>>>>>> DirectCallNode). Instead let the TailCallException carry a CallTarget.
>>>>>> 
>>>>>> As Stefan pointed out, in the catch block of the TailCallException you
>>>>>> are going to need another inline cache similar to the one in
>>>>>> SLDirectDispatchNode. To try it out quickly you can also use an
>>>>>> IndirectCallNode instead, but don't expect perfect performance from this.
>>>>>> 
>>>>>> - Christian Humer
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed Jan 14 2015 at 00:18:13 Cristian Esquivias <
>>>>>> cristian.esquivias at gmail.com> wrote:
>>>>>> 
>>>>>>> Hi Stefan,
>>>>>>> 
>>>>>>> Thanks for taking a look. So I would create a linked list of dispatch
>>>>>>> nodes
>>>>>>> like in Simple Language?
>>>>>>> 
>>>>>>> http://hg.openjdk.java.net/graal/graal/file/3a59e1411192/gra
>>>>>>> al/com.oracle.truffle.sl/src/com/oracle/truffle/sl/nodes/call/
>>>>>>> SLDirectDispatchNode.java
>>>>>>> 
>>>>>>> Since the function object and the call target don't change, how
>>>>>>> should I
>>>>>>> test for equality between one dispatch node and another? That is, how
>>>>>>> do I
>>>>>>> know to use one dispatch node versus another?
>>>>>>> 
>>>>>>> Thanks again,
>>>>>>> Cristian
>>>>>>> 
>>>>>>> On Tue, Jan 13, 2015 at 3:06 PM, Stefan Marr <java at stefan-marr.de>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Cristian:
>>>>>>>> 
>>>>>>>>> On 13 Jan 2015, at 23:25, Cristian Esquivias <
>>>>>>>> cristian.esquivias at gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> I'm trying to add tail call optimization to my test language and
>>>>>>> have a
>>>>>>>>> functional version (no stackoverflows).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> https://github.com/cesquivias/mumbler/commit/3b61f7ccdad7429
>>>>>>> d2feac6068b0f57edce6a4151
>>>>>>>> 
>>>>>>>> Looking at the diff/code, I think the issue is the call node that
>>>>>>> you pass
>>>>>>>> back in the exception.
>>>>>>>> At the point where you handle the exception, you’ll probably need
>>>>>>> to make
>>>>>>>> that node a constant as part of the AST, by using a dispatch node
>>>>>>> chain/an
>>>>>>>> inline cache.
>>>>>>>> 
>>>>>>>> Best regards
>>>>>>>> Stefan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Stefan Marr
>>>>>>>> INRIA Lille - Nord Europe
>>>>>>>> http://stefan-marr.de/research/
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>