Question regarding the (x86) assembler's use of NOP
Vladimir Kozlov
vladimir.kozlov at oracle.com
Sun Sep 12 23:10:29 PDT 2010
David,
LEA NOPs have register dependency. '0x0F 0x1F' is fast multi-byte NOP:
Intel® 64 and IA-32 Architectures Software Developer's Manual
Volume 2B: Instruction Set Reference, N-Z
Vladimir
On 9/12/10 9:50 PM, David Dabbs wrote:
> Hi.
>
> I've been trying to trace PrintAssembly output back to the HS assembler.
> I noticed that the NOP generation code differs from the Intel
> recommendations and was
> wondering if someone could comment on the discrepancies.
>
>
> Thanks,
>
> David
>
>
>
> The Intel Arch Optimization Guide recommends the following regarding NOPs:
>
> 3.5.1.8 Using NOPs
> Code generators generate a no-operation (NOP) to align instructions.
> Examples of NOPs of different lengths in 32-bit mode are shown below:
>
> 1-byte: XCHG EAX, EAX
> 2-byte: 66 NOP
> 3-byte: LEA REG, 0 (REG) (8-bit displacement)
> 4-byte: NOP DWORD PTR [EAX + 0] (8-bit displacement)
> 5-byte: NOP DWORD PTR [EAX + EAX*1 + 0] (8-bit displacement)
> 6-byte: LEA REG, 0 (REG) (32-bit displacement)
> 7-byte: NOP DWORD PTR [EAX + 0] (32-bit displacement)
> 8-byte: NOP DWORD PTR [EAX + EAX*1 + 0] (32-bit displacement)
> 9-byte: NOP WORD PTR [EAX + EAX*1 + 0] (32-bit displacement)
>
> These are all true NOPs, having no effect on the state of the machine except
> to
> advance the EIP. Because NOPs require hardware resources to decode and
> execute,
> use the fewest number to achieve the desired padding.
>
> The one byte NOP:[XCHG EAX,EAX] has special hardware support. Although it
> still
> consumes a μop and its accompanying resources, the dependence upon the old
> value
> of EAX is removed. This μop can be executed at the earliest possible
> opportunity,
> reducing the number of outstanding instructions and is the lowest cost NOP.
>
> The other NOPs have no special hardware support. Their input and output
> registers
> are interpreted by the hardware. Therefore, a code generator should arrange
> to use
> the register containing the oldest value as input, so that the NOP will
> dispatch and
> release RS resources at the earliest possible opportunity.
>
> Try to observe the following NOP generation priority. Select:
> * the smallest number of NOPs and pseudo-NOPs to provide the desired
> padding.
> * NOPs that are least likely to execute on slower execution unit clusters.
> * the register arguments of NOPs to reduce dependencies.
>
> // end Intel Arch ---------------------------
>
>
> The code in assembler_x86.cpp however issues NOPs using:
>
> void Assembler::nop(int i) {
> #ifdef ASSERT
> assert(i> 0, " ");
> // The fancy nops aren't currently recognized by debuggers making it a
> // pain to disassemble code while debugging. If asserts are on clearly
> // speed is not an issue so simply use the single byte traditional nop
> // to do alignment.
>
> for (; i> 0 ; i--) emit_byte(0x90);
> return;
>
> #endif // ASSERT
>
> if (UseAddressNop&& VM_Version::is_intel()) {
> //
> // Using multi-bytes nops "0x0F 0x1F [address]" for Intel
> // 1: 0x90
> // 2: 0x66 0x90
> // 3: 0x66 0x66 0x90 (don't use "0x0F 0x1F 0x00" - need patching safe
> padding)
> // 4: 0x0F 0x1F 0x40 0x00
> // 5: 0x0F 0x1F 0x44 0x00 0x00
> // 6: 0x66 0x0F 0x1F 0x44 0x00 0x00
> // 7: 0x0F 0x1F 0x80 0x00 0x00 0x00 0x00
> // 8: 0x0F 0x1F 0x84 0x00 0x00 0x00 0x00 0x00
> // 9: 0x66 0x0F 0x1F 0x84 0x00 0x00 0x00 0x00 0x00
> // 10: 0x66 0x66 0x0F 0x1F 0x84 0x00 0x00 0x00 0x00 0x00
> // 11: 0x66 0x66 0x66 0x0F 0x1F 0x84 0x00 0x00 0x00 0x00 0x00
>
> // The rest coding is Intel specific - don't use consecutive address
> nops
>
> // 12: 0x0F 0x1F 0x84 0x00 0x00 0x00 0x00 0x00 0x66 0x66 0x66 0x90
> // 13: 0x66 0x0F 0x1F 0x84 0x00 0x00 0x00 0x00 0x00 0x66 0x66 0x66 0x90
> // 14: 0x66 0x66 0x0F 0x1F 0x84 0x00 0x00 0x00 0x00 0x00 0x66 0x66 0x66
> 0x90
> // 15: 0x66 0x66 0x66 0x0F 0x1F 0x84 0x00 0x00 0x00 0x00 0x00 0x66 0x66
> 0x66 0x90
>
> while(i>= 15) {
> // For Intel don't generate consecutive addess nops (mix with regular
> nops)
> i -= 15;
> emit_byte(0x66); // size prefix
> emit_byte(0x66); // size prefix
> emit_byte(0x66); // size prefix
> addr_nop_8();
> emit_byte(0x66); // size prefix
> emit_byte(0x66); // size prefix
> emit_byte(0x66); // size prefix
> emit_byte(0x90); // nop
> }
> switch (i) {
> case 14:
> emit_byte(0x66); // size prefix
> case 13:
> emit_byte(0x66); // size prefix
> case 12:
> addr_nop_8();
> emit_byte(0x66); // size prefix
> emit_byte(0x66); // size prefix
> emit_byte(0x66); // size prefix
> emit_byte(0x90); // nop
> break;
> case 11:
> emit_byte(0x66); // size prefix
> case 10:
> emit_byte(0x66); // size prefix
> case 9:
> emit_byte(0x66); // size prefix
> case 8:
> addr_nop_8();
> break;
> case 7:
> addr_nop_7();
> break;
> case 6:
> emit_byte(0x66); // size prefix
> case 5:
> addr_nop_5();
> break;
> case 4:
> addr_nop_4();
> break;
> case 3:
> // Don't use "0x0F 0x1F 0x00" - need patching safe padding
> emit_byte(0x66); // size prefix
> case 2:
> emit_byte(0x66); // size prefix
> case 1:
> emit_byte(0x90); // nop
> break;
> default:
> assert(i == 0, " ");
> }
> return;
> }
>
>
> void Assembler::addr_nop_4() {
> // 4 bytes: NOP DWORD PTR [EAX+0]
> emit_byte(0x0F);
> emit_byte(0x1F);
> emit_byte(0x40); // emit_rm(cbuf, 0x1, EAX_enc, EAX_enc);
> emit_byte(0); // 8-bits offset (1 byte)
> }
>
> void Assembler::addr_nop_5() {
> // 5 bytes: NOP DWORD PTR [EAX+EAX*0+0] 8-bits offset
> emit_byte(0x0F);
> emit_byte(0x1F);
> emit_byte(0x44); // emit_rm(cbuf, 0x1, EAX_enc, 0x4);
> emit_byte(0x00); // emit_rm(cbuf, 0x0, EAX_enc, EAX_enc);
> emit_byte(0); // 8-bits offset (1 byte)
> }
>
> void Assembler::addr_nop_7() {
> // 7 bytes: NOP DWORD PTR [EAX+0] 32-bits offset
> emit_byte(0x0F);
> emit_byte(0x1F);
> emit_byte(0x80); // emit_rm(cbuf, 0x2, EAX_enc, EAX_enc);
> emit_long(0); // 32-bits offset (4 bytes)
> }
>
> void Assembler::addr_nop_8() {
> // 8 bytes: NOP DWORD PTR [EAX+EAX*0+0] 32-bits offset
> emit_byte(0x0F);
> emit_byte(0x1F);
> emit_byte(0x84); // emit_rm(cbuf, 0x2, EAX_enc, 0x4);
> emit_byte(0x00); // emit_rm(cbuf, 0x0, EAX_enc, EAX_enc);
> emit_long(0); // 32-bits offset (4 bytes)
> }
>
>
>
>
More information about the hotspot-compiler-dev
mailing list