Opt-in for trivial native method calls

Thu Jun 2 15:40:09 UTC 2022

I agree with everything you said. If I could make a java/hotspot intrinsics without forking OpenJDK this is what I would do but unfortunately there doesn't seem to be a way around it.

My plan is not to strictly make native calls to a few CPU instructions but the 5-9ns of overhead is roughly equivalent to a HashMap/ConcurrentHashMap write and x6 read on my system. In the grand scheme of thing this is not expensive, but it does put the bar a bit high concerning what you can do native efficiently.

Alternatively, my project could replace the class bytecode (e.g., replacing object instantiation with multiple local variables to simulate a primitive class, retrieving my intrinsics...) similarly to other JVM languages, but it doesn't seem reliable.

Could you explain what you mean by "there are much better solutions."? As I cannot think of any solution other than forking OpenJDK or using JVMCI
________________________________
De : Jorn Vernee <jorn.vernee at oracle.com>
Envoyé : jeudi 2 juin 2022 16:47
À : Felix Cravic <themode at outlook.fr>; panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
Objet : Re: Opt-in for trivial native method calls

Hi Felix,

The case you have in mind seems to be a native call to a single or few
CPU instructions. Doing a native call for such cases is a bad choice.
The native call overhead of the state transition from Java to native
comes into play (as you've found). But even without that, there is a
bunch of register shuffling needed to conform to the native ABI, since
the register allocator in the JIT can not see through the native call.

I think the right way to support such use cases is to implement them as
a Java API + JIT intrinsic. That will make it so the JIT can fully
understand the code, do inlining, proper register allocation instead of
forcing all data to go through the C ABI, as well as other optimizations
in combination with surrounding code (something that you will not
typically see in the results of in a microbenchmark).

Single/couple CPU instructions, or single system calls, seem the only
real use cases for something like trivial calls/removed thread state
transitions. And in those cases, native calls are just a bad solution
for the reasons outlined above.

Sure, it knocks a few nanoseconds off of a microbenchmark here and
there, but in return it removes a bunch of safety rails. My fear is that
most people will just see this as a "go fast" button, without really
understanding the consequences, and not really being able to notice the
difference any ways because their native code runs much longer. For the
people who would benefit from this, there are much better solutions.

So, overall, trivial calls in the form of an opt-in seems like a net
loss to me.

We are looking at being able to pin heap objects, such as arrays, and
pass them to native code though.

Jorn

On 02/06/2022 12:14, Felix Cravic wrote:
> Hello, coming here after a recommendation from reddit [0] I want to pitch my wish for the Linker API to support "trival" native function.
> My benchmark showed an average latency of around 6ns when calling a native method, which is more than fine most of the time, but fall short for inexpensive calls.
>
> To give a relevant example that could take advantage of such feature, my current toy project [1] involve making the JVM version of Unity's burst compiler [2] which consist in JIT compiling JVM bytecode using LLVM with the hope of getting better performance for highly specialized code (custom intrinsics, stack allocation, value class without valhalla, other fanciness...). And while the calling latency does not make it impossible, it does make it inefficient for taking advantage of specialized instructions (that may not be available in the JDK api) and relatively small methods that could still outperform hotspot equivalent.
>
> I definitely agree that my situation is not the most common (I am responsible for all the native methods, generate class dynamically to create the static method handles) but giving access to low-level tweaks seem like a great solution to me. I am also not sure of what are the tradeoffs of these trivial methods (is it only a matter of safepoint?) so I would be happy to get more technical details.
>
> Thanks!
>
> [0] - https://www.reddit.com/r/java/comments/v33d44/panama_foreign_function_overhead_how_can_it_be/
> [1] - https://github.com/TheMode/Spe
> [2] - https://docs.unity3d.com/Packages/com.unity.burst@0.2-preview.20/manual/index.html
>