Question: "backwards" adressing
Christian Thalinger
Christian.Thalinger at Sun.COM
Sat Sep 5 02:20:22 PDT 2009
Ulf Zibis wrote:
> Do you know any reason for this ?
Let's see what code is generated (on 32-bit x86)...
> static void loop1(int off, char in1, char in2, char[] out) {
> out[off+3] = in1;
> out[off+5] = in2;
> out[off+0] = in1;
> out[off+4] = in2;
> out[off+9] = in1;
> out[off+8] = in2;
> out[off+6] = in1;
> out[off+1] = in2;
> out[off+7] = in1;
> out[off+2] = in2;
> }
030 MOV16 [EDI + #22 + ECX << #1],EBP
035 MOV16 [EDI + #12 + ECX << #1],EDX
03a MOV16 [EDI + #20 + ECX << #1],EBP
03f MOV16 [EDI + #30 + ECX << #1],EDX
044 MOV16 [EDI + #28 + ECX << #1],EBP
049 MOV16 [EDI + #24 + ECX << #1],EDX
04e MOV16 [EDI + #14 + ECX << #1],EBP
053 MOV16 [EDI + #26 + ECX << #1],EDX
058 MOV16 [EDI + #16 + ECX << #1],EBP
>
> static void loop2(int off, char in1, char in2, char[] out) {
> out[off++] = in1;
> out[off++] = in2;
> out[off++] = in1;
> out[off++] = in2;
> out[off++] = in1;
> out[off++] = in2;
> out[off++] = in1;
> out[off++] = in2;
> out[off++] = in1;
> out[off++] = in2;
> }
02e MOV16 [EBX + #14 + ECX << #1],EDI
033 MOV16 [EBX + #16 + ECX << #1],EDX
038 MOV16 [EBX + #18 + ECX << #1],EDI
03d MOV16 [EBX + #20 + ECX << #1],EDX
042 MOV16 [EBX + #22 + ECX << #1],EDI
047 MOV16 [EBX + #24 + ECX << #1],EDX
04c MOV16 [EBX + #26 + ECX << #1],EDI
051 MOV16 [EBX + #28 + ECX << #1],EDX
056 MOV16 [EBX + #30 + ECX << #1],EDI
As you can see the instruction sequence is almost the same, except the
ordering. Theoretically the second one should be faster as continuous
writes should perform better.
Additionally in the final compiled version of main, where both loop1 and
loop2 are inlined, the loops are unrolled a couple of times (4x).
> Hm, that's what I thought as first, so in forward case cpu should do:
> 1. base+index -> index register
> 2. move highSurrogate(accumulator), [index register]
> 3. increment index register
> 4. move lowSurrogate(accumulator), [index register]
>
> Backward case cpu should do:
> 1. base+index -> index register
> 2. move -> temp
> 3. increment index register
> 4. move lowSurrogate(accumulator), [index register]
> 5. load temp -> index register
> 6. move highSurrogate(accumulator), [index register]
>
> Please excuse, that I'm insisting, but I really don't understand why
> both should run in same time.
> Can you explain once more?
>From the assembly above you can see that x86 instructions have complex
addressing modes, and that's why there is no need for an index increment
between the writes.
-- Christian
More information about the hotspot-compiler-dev
mailing list