Opt-in for trivial native method calls

Thu Jun 2 14:47:43 UTC 2022

Hi Felix,

The case you have in mind seems to be a native call to a single or few 
CPU instructions. Doing a native call for such cases is a bad choice. 
The native call overhead of the state transition from Java to native 
comes into play (as you've found). But even without that, there is a 
bunch of register shuffling needed to conform to the native ABI, since 
the register allocator in the JIT can not see through the native call.

I think the right way to support such use cases is to implement them as 
a Java API + JIT intrinsic. That will make it so the JIT can fully 
understand the code, do inlining, proper register allocation instead of 
forcing all data to go through the C ABI, as well as other optimizations 
in combination with surrounding code (something that you will not 
typically see in the results of in a microbenchmark).

Single/couple CPU instructions, or single system calls, seem the only 
real use cases for something like trivial calls/removed thread state 
transitions. And in those cases, native calls are just a bad solution 
for the reasons outlined above.

Sure, it knocks a few nanoseconds off of a microbenchmark here and 
there, but in return it removes a bunch of safety rails. My fear is that 
most people will just see this as a "go fast" button, without really 
understanding the consequences, and not really being able to notice the 
difference any ways because their native code runs much longer. For the 
people who would benefit from this, there are much better solutions.

So, overall, trivial calls in the form of an opt-in seems like a net 
loss to me.

We are looking at being able to pin heap objects, such as arrays, and 
pass them to native code though.

Jorn

On 02/06/2022 12:14, Felix Cravic wrote:
> Hello, coming here after a recommendation from reddit [0] I want to pitch my wish for the Linker API to support "trival" native function.
> My benchmark showed an average latency of around 6ns when calling a native method, which is more than fine most of the time, but fall short for inexpensive calls.
>
> To give a relevant example that could take advantage of such feature, my current toy project [1] involve making the JVM version of Unity's burst compiler [2] which consist in JIT compiling JVM bytecode using LLVM with the hope of getting better performance for highly specialized code (custom intrinsics, stack allocation, value class without valhalla, other fanciness...). And while the calling latency does not make it impossible, it does make it inefficient for taking advantage of specialized instructions (that may not be available in the JDK api) and relatively small methods that could still outperform hotspot equivalent.
>
> I definitely agree that my situation is not the most common (I am responsible for all the native methods, generate class dynamically to create the static method handles) but giving access to low-level tweaks seem like a great solution to me. I am also not sure of what are the tradeoffs of these trivial methods (is it only a matter of safepoint?) so I would be happy to get more technical details.
>
> Thanks!
>
> [0] - https://www.reddit.com/r/java/comments/v33d44/panama_foreign_function_overhead_how_can_it_be/
> [1] - https://github.com/TheMode/Spe
> [2] - https://docs.unity3d.com/Packages/com.unity.burst@0.2-preview.20/manual/index.html
>