RFR: 8313406: nep_invoker_blob can be simplified more

Yasumasa Suenaga ysuenaga at openjdk.org
Mon Jul 31 15:09:09 UTC 2023


In FFM, native function would be called via `nep_invoker_blob`. If the function has two arguments, it would be following:


Decoding RuntimeStub - nep_invoker_blob 0x00007fcae394cd10
--------------------------------------------------------------------------------
  0x00007fcae394cd80: pushq %rbp
  0x00007fcae394cd81: movq %rsp, %rbp
  0x00007fcae394cd84: subq $0, %rsp
 ;; { argument shuffle
  0x00007fcae394cd88: movq %r8, %rax
  0x00007fcae394cd8b: movq %rsi, %r10
  0x00007fcae394cd8e: movq %rcx, %rsi
  0x00007fcae394cd91: movq %rdx, %rdi
 ;; } argument shuffle
  0x00007fcae394cd94: callq *%r10
  0x00007fcae394cd97: leave
  0x00007fcae394cd98: retq


`subq $0, %rsp` is for shadow space on stack, and `movq %r8, %rax` is number of args for variadic function. So they are not necessary in some case. They should be remove following if they are not needed:


Decoding RuntimeStub - nep_invoker_blob 0x00007fd8778e2810
--------------------------------------------------------------------------------
  0x00007fd8778e2880: pushq %rbp
  0x00007fd8778e2881: movq %rsp, %rbp
 ;; { argument shuffle
  0x00007fd8778e2884: movq %rsi, %r10
  0x00007fd8778e2887: movq %rcx, %rsi
  0x00007fd8778e288a: movq %rdx, %rdi
 ;; } argument shuffle
  0x00007fd8778e288d: callq *%r10
  0x00007fd8778e2890: leave
  0x00007fd8778e2891: retq


All java/foreign jtreg tests are passed.

We can see these stub code on [ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/examples/cpumodel) with `-XX:+UnlockDiagnosticVMOptions -XX:+PrintStubCode` and hsdis library. This testcase linked the code with `Linker.Option.isTrivial()`.

After this change, FFM performance on [another ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/benchmarks/funccall) was improved:

before:

Benchmark                           Mode  Cnt          Score          Error  Units
FuncCallComparison.invokeFFMRDTSC  thrpt    3  106664071.816 ± 14396524.718  ops/s
FuncCallComparison.rdtsc           thrpt    3  108024079.738 ± 13223921.011  ops/s


after:

Benchmark                           Mode  Cnt          Score          Error  Units
FuncCallComparison.invokeFFMRDTSC  thrpt    3  107622971.525 ± 12249767.134  ops/s
FuncCallComparison.rdtsc           thrpt    3  107695741.608 ± 23983281.346  ops/s


Environment:
* CPU: AMD Ryzen 3 3300X
* OS: Fedora 38 x86_64 (Kernel 6.3.8-200.fc38.x86_64)
* Hyper-V 4vCPU, 8GB mem

-------------

Commit messages:
 - 8313406: nep_invoker_blob can be simplified more

Changes: https://git.openjdk.org/jdk/pull/15089/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15089&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8313406
  Stats: 41 lines in 3 files changed: 4 ins; 11 del; 26 mod
  Patch: https://git.openjdk.org/jdk/pull/15089.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15089/head:pull/15089

PR: https://git.openjdk.org/jdk/pull/15089


More information about the hotspot-compiler-dev mailing list