RFR: 8240772: x86_64: Pre-generate Assembler::popa, pusha and vzeroupper

Ioi Lam ioi.lam at oracle.com
Tue Mar 10 18:40:04 UTC 2020


Hi Claes,

This is a really good optimization! Small bang for big bucks!

I have a suggestion code coding style:

Rename Assembler::popa to Assembler::popa_slow();

void Assembler::popa() { // 64bit
   if (!precomputed) {
     precompute_instructions();
   }
   copy_precomputed_instructions(popa_code, popa_len);
}

static void precompute_instructions() {
   ...
   MacroAssembler masm(&buffer);

   address begin_popa  = masm.code_section()->end();
   masm.popa_slow();
   address end_popa    = masm.code_section()->end();
   ...
}

----

Also, maybe you can add this assert after generating the code for all 3 
macros:

   assert(masm->code()->total_relocation_size() == 0 &&
masm->code()->total_oop_size() == 0 &&
masm->code()->total_metadata_size() == 0,
          "precomputed code cannot have any of these");


Thanks!
- Ioi



On 3/10/20 6:46 AM, Claes Redestad wrote:
> Hi,
>
> calculate some invariant Assembler routines at bootstrap, copy on
> subsequent invocations.
>
> For popa and pusha this means an overhead reduction of around 98% (from
> ~2500 instructions to emit a pusha to ~50). For vzeroupper an overhead
> reduction of ~65% (117 -> 42). Together these add up to about a 1%
> reduction of instructions executed on a Hello World - with some
> (smaller) scaling impact on larger applications.
>
> The initialization is very simple/naive, i.e., lacks any kind of 
> synchronization protocol. But as this setup is guaranteed to happen very
> early during bootstrap this should be fine. Thanks Ioi for some helpful
> suggestions here!
>
> Bug:    https://bugs.openjdk.java.net/browse/JDK-8240772
> Webrev: http://cr.openjdk.java.net/~redestad/8240772/open.00/
>
> Testing: tier1-3
>
> Thanks!
>
> /Claes



More information about the hotspot-compiler-dev mailing list