Question: "backwards" adressing

Tue Sep 8 05:59:05 PDT 2009

Christian, thanks for remembering my question.

Am 05.09.2009 11:20, Christian Thalinger schrieb:
> Ulf Zibis wrote:
>   
>> Do you know any reason for this ?
>>     
>
> Let's see what code is generated (on 32-bit x86)...
>
>   
>>     static void loop1(int off, char in1, char in2, char[] out) {
>>         out[off+3] = in1;
>>         out[off+5] = in2;
>>         out[off+0] = in1;
>>         out[off+4] = in2;
>>         out[off+9] = in1;
>>         out[off+8] = in2;
>>         out[off+6] = in1;
>>         out[off+1] = in2;
>>         out[off+7] = in1;
>>         out[off+2] = in2;
>>     }
>>     
>
> 030   	MOV16  [EDI + #22 + ECX << #1],EBP
> 035   	MOV16  [EDI + #12 + ECX << #1],EDX
> 03a   	MOV16  [EDI + #20 + ECX << #1],EBP
> 03f   	MOV16  [EDI + #30 + ECX << #1],EDX
> 044   	MOV16  [EDI + #28 + ECX << #1],EBP
> 049   	MOV16  [EDI + #24 + ECX << #1],EDX
> 04e   	MOV16  [EDI + #14 + ECX << #1],EBP
> 053   	MOV16  [EDI + #26 + ECX << #1],EDX
> 058   	MOV16  [EDI + #16 + ECX << #1],EBP
>
>   
>>     static void loop2(int off, char in1, char in2, char[] out) {
>>         out[off++] = in1;
>>         out[off++] = in2;
>>         out[off++] = in1;
>>         out[off++] = in2;
>>         out[off++] = in1;
>>         out[off++] = in2;
>>         out[off++] = in1;
>>         out[off++] = in2;
>>         out[off++] = in1;
>>         out[off++] = in2;
>>     }
>>     
>
> 02e   	MOV16  [EBX + #14 + ECX << #1],EDI
> 033   	MOV16  [EBX + #16 + ECX << #1],EDX
> 038   	MOV16  [EBX + #18 + ECX << #1],EDI
> 03d   	MOV16  [EBX + #20 + ECX << #1],EDX
> 042   	MOV16  [EBX + #22 + ECX << #1],EDI
> 047   	MOV16  [EBX + #24 + ECX << #1],EDX
> 04c   	MOV16  [EBX + #26 + ECX << #1],EDI
> 051   	MOV16  [EBX + #28 + ECX << #1],EDX
> 056   	MOV16  [EBX + #30 + ECX << #1],EDI
>
> As you can see the instruction sequence is almost the same, except the
> ordering.  Theoretically the second one should be faster as continuous
> writes should perform better.
>   

Yes, that's what I'm wondering too, because it's even contrariwise in my 
test.
Additionally I don't understand:
- the additional shift by #1, so the address in loop2 would be 
incremented by 4 (or is there a parenthesis missing around (ECX << #1) 
?), but we are in a char[] not int[]
- Why doesn't hotspot compile to INC opcodes? I think, CPU's won't have 
INC, if they wouldn't have advantages. ???
- This complex addressing moded MOV opcode needs 5 bytes to be loaded 
each, I guess, INC should be shorter.
- Doesn't x86 have a combined MOV_&_INC opcode ?

Stupid questions? OK, it's long time ago, I was programming in 
assembler, and modern x86 didn't exist that time.

-Ulf