Question: "backwards" adressing
Paul Hohensee
Paul.Hohensee at Sun.COM
Tue Sep 8 08:06:35 PDT 2009
The unit of memory access for current x86 designs is larger than a
single 2-byte word,
being usually at least 8 bytes. The processors have store combining
buffers that
merge stores to the same line if they happen close enough together in
time, so a
small number of store instructions such as in your example will all
merge up into
at most 3 store buffer entries, regardless of instruction ordering.
Paul
Ulf Zibis wrote:
> Christian, thanks for remembering my question.
>
>
> Am 05.09.2009 11:20, Christian Thalinger schrieb:
>> Ulf Zibis wrote:
>>
>>> Do you know any reason for this ?
>>>
>>
>> Let's see what code is generated (on 32-bit x86)...
>>
>>
>>> static void loop1(int off, char in1, char in2, char[] out) {
>>> out[off+3] = in1;
>>> out[off+5] = in2;
>>> out[off+0] = in1;
>>> out[off+4] = in2;
>>> out[off+9] = in1;
>>> out[off+8] = in2;
>>> out[off+6] = in1;
>>> out[off+1] = in2;
>>> out[off+7] = in1;
>>> out[off+2] = in2;
>>> }
>>>
>>
>> 030 MOV16 [EDI + #22 + ECX << #1],EBP
>> 035 MOV16 [EDI + #12 + ECX << #1],EDX
>> 03a MOV16 [EDI + #20 + ECX << #1],EBP
>> 03f MOV16 [EDI + #30 + ECX << #1],EDX
>> 044 MOV16 [EDI + #28 + ECX << #1],EBP
>> 049 MOV16 [EDI + #24 + ECX << #1],EDX
>> 04e MOV16 [EDI + #14 + ECX << #1],EBP
>> 053 MOV16 [EDI + #26 + ECX << #1],EDX
>> 058 MOV16 [EDI + #16 + ECX << #1],EBP
>>
>>
>>> static void loop2(int off, char in1, char in2, char[] out) {
>>> out[off++] = in1;
>>> out[off++] = in2;
>>> out[off++] = in1;
>>> out[off++] = in2;
>>> out[off++] = in1;
>>> out[off++] = in2;
>>> out[off++] = in1;
>>> out[off++] = in2;
>>> out[off++] = in1;
>>> out[off++] = in2;
>>> }
>>>
>>
>> 02e MOV16 [EBX + #14 + ECX << #1],EDI
>> 033 MOV16 [EBX + #16 + ECX << #1],EDX
>> 038 MOV16 [EBX + #18 + ECX << #1],EDI
>> 03d MOV16 [EBX + #20 + ECX << #1],EDX
>> 042 MOV16 [EBX + #22 + ECX << #1],EDI
>> 047 MOV16 [EBX + #24 + ECX << #1],EDX
>> 04c MOV16 [EBX + #26 + ECX << #1],EDI
>> 051 MOV16 [EBX + #28 + ECX << #1],EDX
>> 056 MOV16 [EBX + #30 + ECX << #1],EDI
>>
>> As you can see the instruction sequence is almost the same, except the
>> ordering. Theoretically the second one should be faster as continuous
>> writes should perform better.
>>
>
> Yes, that's what I'm wondering too, because it's even contrariwise in
> my test.
> Additionally I don't understand:
> - the additional shift by #1, so the address in loop2 would be
> incremented by 4 (or is there a parenthesis missing around (ECX << #1)
> ?), but we are in a char[] not int[]
> - Why doesn't hotspot compile to INC opcodes? I think, CPU's won't
> have INC, if they wouldn't have advantages. ???
> - This complex addressing moded MOV opcode needs 5 bytes to be loaded
> each, I guess, INC should be shorter.
> - Doesn't x86 have a combined MOV_&_INC opcode ?
>
> Stupid questions? OK, it's long time ago, I was programming in
> assembler, and modern x86 didn't exist that time.
>
> -Ulf
>
>
>
>
More information about the hotspot-compiler-dev
mailing list