RFR: 8287373: remove unnecessary paddings in generated code
Evgeny Astigeevich
duke at openjdk.java.net
Tue Jun 7 15:49:24 UTC 2022
On Thu, 28 Apr 2022 14:46:57 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:
> The goal is to remove unnecessary paddings in generated code. The alignment of the [Stub Code] section is determined by the same value as the alignment of the [Entry Point] section: the CodeEntryAlignment parameter with default values 64B on AARCH, and 32B on AMD.
>
> Large entry alignment values are questionable for entry section. For example, Arm Neoverse N1 Software Optimization Guide recommends to align subroutines to 32B, while static compilers uses an even smaller value of 16B. However, with this change, I suggest to apply different (and smaller) values for [Constants] and [Stub Code] section alignments. This makes overall code 2% smaller on AARCH.
>
> The correctness of the changes is checked by jtreg. Performance tested by Renaissance and SpecJBB benchmarkds on AARCH and AMD.
>
> Example. Dummy method disassembly on AARCH, before vs after:
>
> [Verified Entry Point] | [Verified Entry Point]
> 78c63b80: nop | 7437e480: nop
> 78c63b84: sub x9, sp, #0x20, lsl #12 | 7437e484: sub x9, sp, #0x20, lsl #12
> 78c63b88: str xzr, [x9] | 7437e488: str xzr, [x9]
> 78c63b8c: sub sp, sp, #0x20 | 7437e48c: sub sp, sp, #0x20
> 78c63b90: stp x29, x30, [sp, #16] | 7437e490: stp x29, x30, [sp, #16]
> 78c63b94: orr w1, wzr, #0x10 | 7437e494: orr w1, wzr, #0x10
> 78c63b98: bl 78343e00 | 7437e498: bl 73a61980
> 78c63b9c: .inst 0x00000000 ; undefined | 7437e49c: .inst 0x00000000 ; undefined
> 78c63ba0: .inst 0x00000000 ; undefined |
> 78c63ba4: .inst 0x00000000 ; undefined |
> 78c63ba8: .inst 0x00000000 ; undefined |
> 78c63bac: .inst 0x00000000 ; undefined |
> 78c63bb0: .inst 0x00000000 ; undefined |
> 78c63bb4: .inst 0x00000000 ; undefined |
> 78c63bb8: .inst 0x00000000 ; undefined |
> 78c63bbc: .inst 0x00000000 ; undefined |
> [Stub Code] | [Stub Code]
> 78c63bc0: ldr x8, 78c63bc8 | 7437e4a0: ldr x8, 7437e4a8
> 78c63bc4: br x8 | 7437e4a4: br x8
> 78c63bc8: .inst 0x78343e00 ; undefined | 7437e4a8: .inst 0x73a61980 ; undefined
> 78c63bcc: .inst ; undefined | 7437e4ac: .inst ; undefined
> [Exception Handler] | [Exception Handler]
> 78c63bd0: b 783ee080 | 7437e4b0: b 73b0c100
> [Deopt Handler Code] | [Deopt Handler Code]
> 78c63bd4: adr x30, 78c63bd4 | 7437e4b4: adr x30, 7437e4b4
> 78c63bd8: b 78343ac0 | 7437e4b8: b 73a61620
> 78c63bdc: .inst 0x00000000 ; undefined | 7437e4bc: .inst 0x00000000 ; undefined
src/hotspot/cpu/aarch64/globals_aarch64.hpp line 40:
> 38:
> 39: define_pd_global(uintx, CodeCacheSegmentSize, 64 COMPILER1_AND_COMPILER2_PRESENT(+64)); // Tiered compilation has large code-entry alignment.
> 40: define_pd_global(intx, CodeEntryAlignment, 64);
This change looks reasonable to me. I found the following in the N1 Opt Guide:
Consider aligning subroutine entry points and branch targets to 32B boundaries, within the bounds of the code-density requirements of the program. This will ensure that the subsequent fetch can maximize bandwidth following the taken branch by bringing in all useful instructions
src/hotspot/share/asm/codeBuffer.hpp line 259:
> 257: // TODO: move InteriorEntryAlignment to common c1/c2 header
> 258: int code_entry_aliginment = (int) COMPILER2_PRESENT(InteriorEntryAlignment) NOT_COMPILER2(CodeEntryAlignment);
> 259: return MAX2((int)sizeof(jdouble), code_entry_aliginment);
I don't think the stub code section needs alignment more than 4 bytes. It is not performance critical code.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8453
More information about the hotspot-dev
mailing list