RFR: 8306706: Support out-of-line code generation for MachNodes

Sun Apr 23 18:49:42 UTC 2023

On Sun, 23 Apr 2023 18:22:35 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

> Hi,
> 
> This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews.
> 
> Thanks a lot.

With this patch, the compiled code for a float-to-int conversion is changed:

    Before:

        vcvttss2si %xmm1,%eax
        cmp    $0x80000000,%eax
        jne    DONE
        sub    $0x8,%rsp
        vmovss %xmm1,(%rsp)
        call   Stub::f2i_fixup              ;   {runtime_call StubRoutines (initial stubs)}
        pop    %rax
    DONE:

    After:

        vcvttss2si %xmm1,%eax
        cmp    $0x80000000,%eax
        je     STUB
    CONTINUE:

    STUB:
        sub    $0x8,%rsp
        vmovss %xmm1,(%rsp)
        call   Stub::f2i_fixup              ;   {runtime_call StubRoutines (initial stubs)}
        pop    %rax
        jmp    CONTINUE

And there are slight improvements shown in microbenchmarks, although the result differs run-to-run, the patched version seems to be generally more performant:

                                          Before             After
    Benchmark             Mode  Cnt    Score    Error    Score    Error  Units   Change
    ConvertF2I.d2iArray   avgt    5  266.890 ±  3.277  260.720 ±  1.382  ns/op   -2.31%
    ConvertF2I.d2iSingle  avgt    5    0.378 ±  0.005    0.317 ±  0.013  ns/op  -16.14%
    ConvertF2I.d2lArray   avgt    5  273.999 ± 12.571  267.862 ±  4.806  ns/op   -2.24%
    ConvertF2I.d2lSingle  avgt    5    0.379 ±  0.005    0.348 ±  0.044  ns/op   -8.18%
    ConvertF2I.f2iArray   avgt    5  261.549 ±  1.391  255.522 ± 15.133  ns/op   -2.30%
    ConvertF2I.f2iSingle  avgt    5    0.378 ±  0.005    0.311 ±  0.007  ns/op  -17.72%
    ConvertF2I.f2lArray   avgt    5  272.745 ±  1.661  267.770 ±  7.033  ns/op   -1.82%
    ConvertF2I.f2lSingle  avgt    5    0.379 ±  0.007    0.350 ±  0.022  ns/op   -7.65%

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13602#issuecomment-1519130423