Opt-in for trivial native method calls
Jorn Vernee
jorn.vernee at oracle.com
Thu Jun 2 14:47:43 UTC 2022
Hi Felix,
The case you have in mind seems to be a native call to a single or few
CPU instructions. Doing a native call for such cases is a bad choice.
The native call overhead of the state transition from Java to native
comes into play (as you've found). But even without that, there is a
bunch of register shuffling needed to conform to the native ABI, since
the register allocator in the JIT can not see through the native call.
I think the right way to support such use cases is to implement them as
a Java API + JIT intrinsic. That will make it so the JIT can fully
understand the code, do inlining, proper register allocation instead of
forcing all data to go through the C ABI, as well as other optimizations
in combination with surrounding code (something that you will not
typically see in the results of in a microbenchmark).
Single/couple CPU instructions, or single system calls, seem the only
real use cases for something like trivial calls/removed thread state
transitions. And in those cases, native calls are just a bad solution
for the reasons outlined above.
Sure, it knocks a few nanoseconds off of a microbenchmark here and
there, but in return it removes a bunch of safety rails. My fear is that
most people will just see this as a "go fast" button, without really
understanding the consequences, and not really being able to notice the
difference any ways because their native code runs much longer. For the
people who would benefit from this, there are much better solutions.
So, overall, trivial calls in the form of an opt-in seems like a net
loss to me.
We are looking at being able to pin heap objects, such as arrays, and
pass them to native code though.
Jorn
On 02/06/2022 12:14, Felix Cravic wrote:
> Hello, coming here after a recommendation from reddit [0] I want to pitch my wish for the Linker API to support "trival" native function.
> My benchmark showed an average latency of around 6ns when calling a native method, which is more than fine most of the time, but fall short for inexpensive calls.
>
> To give a relevant example that could take advantage of such feature, my current toy project [1] involve making the JVM version of Unity's burst compiler [2] which consist in JIT compiling JVM bytecode using LLVM with the hope of getting better performance for highly specialized code (custom intrinsics, stack allocation, value class without valhalla, other fanciness...). And while the calling latency does not make it impossible, it does make it inefficient for taking advantage of specialized instructions (that may not be available in the JDK api) and relatively small methods that could still outperform hotspot equivalent.
>
> I definitely agree that my situation is not the most common (I am responsible for all the native methods, generate class dynamically to create the static method handles) but giving access to low-level tweaks seem like a great solution to me. I am also not sure of what are the tradeoffs of these trivial methods (is it only a matter of safepoint?) so I would be happy to get more technical details.
>
> Thanks!
>
> [0] - https://www.reddit.com/r/java/comments/v33d44/panama_foreign_function_overhead_how_can_it_be/
> [1] - https://github.com/TheMode/Spe
> [2] - https://docs.unity3d.com/Packages/com.unity.burst@0.2-preview.20/manual/index.html
>
More information about the panama-dev
mailing list