RFR: 8240772: x86_64: Pre-generate Assembler::popa, pusha and vzeroupper
Ioi Lam
ioi.lam at oracle.com
Tue Mar 10 18:40:04 UTC 2020
Hi Claes,
This is a really good optimization! Small bang for big bucks!
I have a suggestion code coding style:
Rename Assembler::popa to Assembler::popa_slow();
void Assembler::popa() { // 64bit
if (!precomputed) {
precompute_instructions();
}
copy_precomputed_instructions(popa_code, popa_len);
}
static void precompute_instructions() {
...
MacroAssembler masm(&buffer);
address begin_popa = masm.code_section()->end();
masm.popa_slow();
address end_popa = masm.code_section()->end();
...
}
----
Also, maybe you can add this assert after generating the code for all 3
macros:
assert(masm->code()->total_relocation_size() == 0 &&
masm->code()->total_oop_size() == 0 &&
masm->code()->total_metadata_size() == 0,
"precomputed code cannot have any of these");
Thanks!
- Ioi
On 3/10/20 6:46 AM, Claes Redestad wrote:
> Hi,
>
> calculate some invariant Assembler routines at bootstrap, copy on
> subsequent invocations.
>
> For popa and pusha this means an overhead reduction of around 98% (from
> ~2500 instructions to emit a pusha to ~50). For vzeroupper an overhead
> reduction of ~65% (117 -> 42). Together these add up to about a 1%
> reduction of instructions executed on a Hello World - with some
> (smaller) scaling impact on larger applications.
>
> The initialization is very simple/naive, i.e., lacks any kind of
> synchronization protocol. But as this setup is guaranteed to happen very
> early during bootstrap this should be fine. Thanks Ioi for some helpful
> suggestions here!
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8240772
> Webrev: http://cr.openjdk.java.net/~redestad/8240772/open.00/
>
> Testing: tier1-3
>
> Thanks!
>
> /Claes
More information about the hotspot-compiler-dev
mailing list