Opt-in for trivial native method calls

Andrew Haley aph-open at littlepinkcloud.com
Fri Jun 3 10:47:46 UTC 2022


On 6/2/22 11:14, Felix Cravic wrote:
> My benchmark showed an average latency of around 6ns when calling a native method, which is more than fine most of the time, but fall short for inexpensive calls.

So this is a rather long email, for which I apologize, but I made a JMH
benchmark to measure where all the time goes, and I got this result.
This was just for my curiosity, but I though it useful enough that it
might prove to be illuminating to other readers.

It doesn't seem to be doing much that's inessential, although I do wonder
about all those vzeroupper instructions. There's a vzeroupper in set_last_Java_frame(),
one in restore_cpu_control_state_after_jni(), and another in reset_last_Java_frame().

The operations are:

Save the {sp, bp, pc} in the current JavaThread.
Set the JavaThread to _thread_in_native.
Shuffle the args into the native registers.
Call the function.
Full fence.
Safepoint poll.
Reguard stack.

I don't think there's any fat here except one vzeroupper. The rest is simply
what you have to do in order to transition from Java to native.

             Decoding RuntimeStub - nep_invoker_blob 0x00007fdb43b97490
             --------------------------------------------------------------------------------
               0x00007fdb43b97520:   push   %rbp
    2.63%      0x00007fdb43b97521:   mov    %rsp,%rbp
               0x00007fdb43b97524:   sub    $0x10,%rsp
              ;; { thread java2native
               0x00007fdb43b97528:   vzeroupper
    5.03%      0x00007fdb43b9752b:   mov    %rbp,0x320(%r15)
               0x00007fdb43b97532:   movabs $0x7fdb43b97528,%r10
    0.03%      0x00007fdb43b9753c:   mov    %r10,0x318(%r15)
               0x00007fdb43b97543:   mov    %rsp,0x310(%r15)
    2.92%      0x00007fdb43b9754a:   movl   $0x4,0x3c4(%r15)
              ;; } thread java2native
              ;; { argument shuffle
              ;; bt=long
               0x00007fdb43b97555:   mov    %rcx,%rax
              ;; bt=int
               0x00007fdb43b97558:   mov    %rdx,%rdi
              ;; bt=long
               0x00007fdb43b9755b:   mov    %rsi,%r10
              ;; } argument shuffle
               0x00007fdb43b9755e:   call   *%r10
              ;; { thread native2java
    0.13%      0x00007fdb43b97561:   vzeroupper
    5.71%      0x00007fdb43b97564:   movl   $0x5,0x3c4(%r15)
    8.85%      0x00007fdb43b9756f:   lock addl $0x0,-0x40(%rsp)
   15.89%      0x00007fdb43b97575:   cmp    0x3c8(%r15),%rbp
    3.37%  ╭   0x00007fdb43b9757c:   ja     0x00007fdb43b975d5
           │   0x00007fdb43b97582:   cmpl   $0x0,0x3c0(%r15)
           │╭  0x00007fdb43b9758d:   jne    0x00007fdb43b975d5
           ││  0x00007fdb43b97593:   movl   $0x8,0x3c4(%r15)
           ││ ;; reguard stack check
           ││  0x00007fdb43b9759e:   cmpl   $0x2,0x450(%r15)
           ││  0x00007fdb43b975a9:   je     0x00007fdb43b975fb
           ││  0x00007fdb43b975af:   movq   $0x0,0x310(%r15)
    0.03%  ││  0x00007fdb43b975ba:   movq   $0x0,0x320(%r15)
           ││  0x00007fdb43b975c5:   movq   $0x0,0x318(%r15)
    2.27%  ││  0x00007fdb43b975d0:   vzeroupper
           ││ ;; } thread native2java
   39.69%  ││  0x00007fdb43b975d3:   leave
           ││  0x00007fdb43b975d4:   ret
           ││ ;; { L_safepoint_poll_slow_path
           ↘↘  0x00007fdb43b975d5:   vzeroupper
               0x00007fdb43b975d8:   mov    %rax,(%rsp)
               0x00007fdb43b975dc:   mov    %r15,%rdi
               0x00007fdb43b975df:   mov    %rsp,%r12
               0x00007fdb43b975e2:   sub    $0x0,%rsp
               0x00007fdb43b975e6:   and    $0xfffffffffffffff0,%rsp
               0x00007fdb43b975ea:   call   0x00007fdb526b8950 = JavaThread::check_special_condition_for_native_trans(JavaThread*)
....................................................................................................
   86.54%  <total for region 1>

Here's the JMH benchmark:

package org.sample;

import java.lang.invoke.*;
import java.lang.foreign.*;
import java.util.concurrent.TimeUnit;

import org.openjdk.jmh.annotations.*;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class PanamaExample3 {

     static final Linker linker;
     static final MethodHandle c_abs;

     static {
         linker = Linker.nativeLinker();
         c_abs = linker.downcallHandle
             (linker.defaultLookup().lookup("abs").get(),
              FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.JAVA_INT));
     }

     @Benchmark
     public int doit() throws Throwable {
         int len = (int)c_abs.invoke(-5); // 5
         return len;
     }

     public static void main(String[] args)  throws Throwable {
         System.out.println(new PanamaExample3().doit());
     }
}

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


More information about the panama-dev mailing list