Integrated: 8313406: nep_invoker_blob can be simplified more
Yasumasa Suenaga
ysuenaga at openjdk.org
Mon Aug 14 23:17:17 UTC 2023
On Mon, 31 Jul 2023 12:22:00 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:
> In FFM, native function would be called via `nep_invoker_blob`. If the function has two arguments, it would be following:
>
>
> Decoding RuntimeStub - nep_invoker_blob 0x00007fcae394cd10
> --------------------------------------------------------------------------------
> 0x00007fcae394cd80: pushq %rbp
> 0x00007fcae394cd81: movq %rsp, %rbp
> 0x00007fcae394cd84: subq $0, %rsp
> ;; { argument shuffle
> 0x00007fcae394cd88: movq %r8, %rax
> 0x00007fcae394cd8b: movq %rsi, %r10
> 0x00007fcae394cd8e: movq %rcx, %rsi
> 0x00007fcae394cd91: movq %rdx, %rdi
> ;; } argument shuffle
> 0x00007fcae394cd94: callq *%r10
> 0x00007fcae394cd97: leave
> 0x00007fcae394cd98: retq
>
>
> `subq $0, %rsp` is for shadow space on stack, and `movq %r8, %rax` is number of args for variadic function. So they are not necessary in some case. They should be remove following if they are not needed:
>
>
> Decoding RuntimeStub - nep_invoker_blob 0x00007fd8778e2810
> --------------------------------------------------------------------------------
> 0x00007fd8778e2880: pushq %rbp
> 0x00007fd8778e2881: movq %rsp, %rbp
> ;; { argument shuffle
> 0x00007fd8778e2884: movq %rsi, %r10
> 0x00007fd8778e2887: movq %rcx, %rsi
> 0x00007fd8778e288a: movq %rdx, %rdi
> ;; } argument shuffle
> 0x00007fd8778e288d: callq *%r10
> 0x00007fd8778e2890: leave
> 0x00007fd8778e2891: retq
>
>
> All java/foreign jtreg tests are passed.
>
> We can see these stub code on [ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/examples/cpumodel) with `-XX:+UnlockDiagnosticVMOptions -XX:+PrintStubCode` and hsdis library. This testcase linked the code with `Linker.Option.isTrivial()`.
>
> After this change, FFM performance on [another ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/benchmarks/funccall) was improved:
>
> before:
>
> Benchmark Mode Cnt Score Error Units
> FuncCallComparison.invokeFFMRDTSC thrpt 3 106664071.816 ± 14396524.718 ops/s
> FuncCallComparison.rdtsc thrpt 3 108024079.738 ± 13223921.011 ops/s
>
>
> after:
>
> Benchmark Mode Cnt Score Error Units
> FuncCallComparison.invokeFFMRDTSC thrpt 3 107622971.525 ± 12249767.134 ops/s
> FuncCallComparison.rdtsc thrpt 3 107695741.608 ± 23983281.346 ops/s
>
>
> Environment:
> * CPU: AMD Ryzen 3 3300X
> * OS: Fedora 38 x86_64 (Kernel 6.3.8-200.fc38.x86_64)
> * Hyper-V 4vCPU, 8GB mem
This pull request has now been integrated.
Changeset: 583cb754
Author: Yasumasa Suenaga <ysuenaga at openjdk.org>
URL: https://git.openjdk.org/jdk/commit/583cb754f38f5d32144e302ce5e82a3b36a2cb78
Stats: 41 lines in 3 files changed: 4 ins; 11 del; 26 mod
8313406: nep_invoker_blob can be simplified more
Reviewed-by: jvernee, vlivanov
-------------
PR: https://git.openjdk.org/jdk/pull/15089
More information about the hotspot-compiler-dev
mailing list