Hacking Truffle to avoid argument array allocation at calls

Christian Humer christian.humer at gmail.com
Tue Oct 30 12:46:46 UTC 2018


Hi Arthur,

> I want to improve function call performance in Truffle on HotSpot. Would
it be possible to hack Truffle to support a few specific function
arities without allocating the argument array?

Yes absolutely. We already removed that overhead on SVM. Its a bit more
tricky on HotSpot.
The magic for SVM happens in SubstrateOptimizedCallTarget#doInvoke.
As a first step maybe you could share your micro benchmarks for calls that
you'd like to improve?


>  1. Is there something special about how these methods are compiled that
    would prevent this? (searching on github makes me think the only
    magic is that @TruffleCallBoundary prevents inlining)

There is more magic to that. Look at
HotSpotTruffleCompiler#installTruffleCallBoundaryMethods
That installs an assembly stub that allows on invocations of the boundary
method to jump to the optimized machine code.
If we want to support specialized argument signatures we would need to
extend this to have additional entry points with the optimized signature.

>  2. Would the partial evaluator/compiler be able to eliminate the arrays
    passed to doInvoke and stored in the VirtualFrame since both of
    those are local to a single PE/compilation unit?

Yes. But you need to tell Partial Escape Analysis that the argument array
does not actually escape.
Since the array originates from the guest language the array might also
escape on the caller side. So you would need to be prepared if the caller
escapes the Object[].

> profile the argument count and specialize the call
to the arity specific version of callBoundary.

We already do profile the arguments. Look at
OptimizedCompilationProfile#profiledArgumentTypes.

Hope this helps,

- Christian Humer

On Mon, Oct 29, 2018 at 11:35 PM Arthur Peters <amp at cs.utexas.edu> wrote:

> Esteemed Truffle Hackers,
> **
>
> I want to improve function call performance in Truffle on HotSpot. Would
> it be possible to hack Truffle to support a few specific function
> arities without allocating the argument array?
>
> It appears that OptimizedCallTarget.callBoundary
> <
> https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.truffle.runtime/src/org/graalvm/compiler/truffle/runtime/OptimizedCallTarget.java#L250-L266
> >
> could be duplicated for specific arities (with the duplicates taking the
> specific number of args) and then doInvoke
> <
> https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.compiler.truffle.runtime/src/org/graalvm/compiler/truffle/runtime/OptimizedCallTarget.java#L246-L248
> >
> could be modified to profile the argument count and specialize the call
> to the arity specific version of callBoundary.
>
>  1. Is there something special about how these methods are compiled that
>     would prevent this? (searching on github makes me think the only
>     magic is that @TruffleCallBoundary prevents inlining)
>  2. Would the partial evaluator/compiler be able to eliminate the arrays
>     passed to doInvoke and stored in the VirtualFrame since both of
>     those are local to a single PE/compilation unit?
>
> Thanks.
>
> -Arthur
>
>
> *Background for the interested:*
>
> I'm working on performance improvements for my Truffle language
> (Orc/PorcE
> <https://github.com/orc-lang/orc/tree/improved-porce-heuristics/PorcE>).
> I've realized that argument array allocations (which occur for every
> Truffle call on HotSpot) are a MAJOR performance problem for me
> (something like 5GB/s of allocations). This is because my code is
> partially continuation passing style and does a lot of function calls.
> I'm looking for a temporary solution that allows me to continue my
> research without too much engineering. I'm evaluating various options,
> including reducing the number of calls, running my system on SVM, and
> hacking Truffle to avoid the allocations for specific arities.
>
> The the first option is challenging because the system is pretty deeply
> CPS. The second option is hard because the system uses MethodHandles
> (and the usage is complex, so even converting to reflection would not
> totally solve the SVM issues). So I'm focusing on hacking Truffle,
> because this is a research project and using a modified version is fine.
>
>
>
>


More information about the graal-dev mailing list