Method Fusion proposal

Fri Jun 15 20:14:53 UTC 2018

Flavio Brasil wrote on 6/15/18 11:48 AM:
> Thank you for the quick feedback, Gilles.
> 
> Yes, building the frame state is currently where I'm stuck at. Would you be
> able to give some details on how the optimization that fuses sequential
> `StringBuilder.append` on Graal EE works?

The strategy there is to ensure that the operations being fused produce 
no visible side effects and deoptimize back to the point where the whole 
operation started if anything happens to go wrong.  So all 
deoptimization states come from the last side effect before the creation 
of the StringBuilder.  That way we don't need to synthesize any frame 
states.

That strategy doesn't work so well when the operations to be fused are 
large or where side effects might occur because there's no way to create 
a state that would allow you to resume execution.

Having the compiler perform what's essentially bytecode rewriting on the 
fly doesn't work that easily either since you don't actually end up with 
executable bytecodes that you can deoptimize to.

I think what you probably want is a hybrid approach where there are real 
bytecodes in the normal path to handle the fusion but there's a little 
compiler magic to select the fused pathway.  Normal execution would 
always do the unfused pathway but Graal would decide how much of the 
fused pathway to use and possibly chain together sequences which get 
connected by inlining.  The amount of magic in Graal should be kept to a 
minimum so that it doesn't have to do fixup that might be tricky or 
impossible.  This is similar to the idea of doing lambda operations by 
building an expression tree and the final step is to apply that 
expression tree which might involve compiling or rewriting it into a 
more efficient form before execution.

tom

> Are there plans to open source it?
> 
> Best,
> 
> Flavio
> 
> On Fri, Jun 15, 2018 at 8:54 AM Gilles Duboscq <gilles.m.duboscq at oracle.com>
> wrote:
> 
>> Hi Flavio,
>>
>> Usually, the difficult part of this kind of transforms are the stacks
>> ("framestates") that your are going to give to the VM along with the
>> compiled code.
>>
>> In principle they should represent stacks and states that are possible to
>> reach through the bytecodes that are being compiled so that things like
>> deoptimization can work.
>>
>> In previous experimentation we concluded that HotSpot really doesn't like
>> fake framestates.
>> Maybe some of these HotSpot limitations could be lifted with some C++
>> changes??
>>
>> Overall getting FrameState nodes that will work will be your biggest
>> challenge:
>> - what will be the bci of the FrameState for the introduced calls?
>> - what values will be on the stack if deoptimization occurs for while the
>> new calls or the re-written calls are on the stack? can execution proceed
>> correctly?
>>
>>   Gilles
>>
>> On 15/06/18 16:27, Flavio Brasil wrote:
>>> *Hi Graal developers,We’ve been exploring a new JIT optimization at
>> Twitter
>>> that could significantly improve the efficiency of some of the high-level
>>> abstractions normally used in Scala. The project is at its early stage so
>>> we thought it’d be best to have your feedback on it sooner rather than
>>> later since you might be interested in collaborating on it.The initial
>>> design doc is available at *https://docs.google.com/document/d/139XJhODDMfcWBNi80-jSWNGrBEmggch72oaNA5LXF2I/edit#heading=h.uqhewhqfc5rs
>> <https://docs.google.com/document/d/139XJhODDMfcWBNi80-jSWNGrBEmggch72oaNA5LXF2I/edit#heading=h.uqhewhqfc5rs>* and
>> open for comments.Thank you,Flavio
>>> BrasilVM Team @ Twitter*
>>>
>>