RFR: 8306706: Support out-of-line code generation for MachNodes
Quan Anh Mai
qamai at openjdk.org
Sun Apr 23 18:49:42 UTC 2023
On Sun, 23 Apr 2023 18:22:35 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
> Hi,
>
> This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews.
>
> Thanks a lot.
With this patch, the compiled code for a float-to-int conversion is changed:
Before:
vcvttss2si %xmm1,%eax
cmp $0x80000000,%eax
jne DONE
sub $0x8,%rsp
vmovss %xmm1,(%rsp)
call Stub::f2i_fixup ; {runtime_call StubRoutines (initial stubs)}
pop %rax
DONE:
After:
vcvttss2si %xmm1,%eax
cmp $0x80000000,%eax
je STUB
CONTINUE:
STUB:
sub $0x8,%rsp
vmovss %xmm1,(%rsp)
call Stub::f2i_fixup ; {runtime_call StubRoutines (initial stubs)}
pop %rax
jmp CONTINUE
And there are slight improvements shown in microbenchmarks, although the result differs run-to-run, the patched version seems to be generally more performant:
Before After
Benchmark Mode Cnt Score Error Score Error Units Change
ConvertF2I.d2iArray avgt 5 266.890 ± 3.277 260.720 ± 1.382 ns/op -2.31%
ConvertF2I.d2iSingle avgt 5 0.378 ± 0.005 0.317 ± 0.013 ns/op -16.14%
ConvertF2I.d2lArray avgt 5 273.999 ± 12.571 267.862 ± 4.806 ns/op -2.24%
ConvertF2I.d2lSingle avgt 5 0.379 ± 0.005 0.348 ± 0.044 ns/op -8.18%
ConvertF2I.f2iArray avgt 5 261.549 ± 1.391 255.522 ± 15.133 ns/op -2.30%
ConvertF2I.f2iSingle avgt 5 0.378 ± 0.005 0.311 ± 0.007 ns/op -17.72%
ConvertF2I.f2lArray avgt 5 272.745 ± 1.661 267.770 ± 7.033 ns/op -1.82%
ConvertF2I.f2lSingle avgt 5 0.379 ± 0.007 0.350 ± 0.022 ns/op -7.65%
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13602#issuecomment-1519130423
More information about the hotspot-compiler-dev
mailing list