Field access optimisations inside loops (question).

Ulf Zibis Ulf.Zibis at gmx.de
Fri Jan 8 01:30:35 PST 2010


1. I'm wondering, why there is only moved a byte (MOV8) to the 'value' 
variable instead of a int. (what is the element size in list?)

2. I agree, if:
- list is private member in this class,
- list is not reachable by other methods of this class from a concurrent 
thread.

For similar problem see: Bug Id 6914095 
<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6914095>.

-Ulf


Am 08.01.2010 08:52, Dawid Weiss schrieb:
> Hi everyone,
>
> I have been wondering about one thing and failed to find the answer so far.
>
> I observed the following behavior from the HotSpot compiler. Consider
> the following two loops:
>
> 1.
>        for (int i = 0; i < list.size(); i++)
>        {
>            value = list.get(i);
>        }
>
> 2.
>        final int size = list.size();
>        final int [] buffer = list.buffer;
>        for (int i = 0; i < size; i++)
>        {
>            value = buffer[i];
>        }
>
> The list variable's class is simple and the implementation of size()
> and get() is trivial, basically:
>
> get == return buffer[index];
> size == return elementsCount;
>
> (no boundary checks, inheritance or anything like this).
>
> What's interesting is that loop (2) is at least TWICE as fast as loop
> (1) (on various CPU architectures, multi and single-core systems). The "value"
> variable is a public static volatile to force the compiler to actually
> read the buffer's content and store is somewhere. Example timings:
>
> testSimpleGetLoop          : time.bench: 1.97, round: 0.20 [+- 0.00],
> round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
> testDirectBufferLoop       : time.bench: 0.57, round: 0.06 [+- 0.01],
> round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
>
> (note the "round" field above, in brackets are standard deviations
> from multiple test runs).
>
> I looked at the assembly dumped by the HotSpot and from it I can
> conclude that in case of loop (1) the generated code always attempts
> to re-read the fields referenced through the list field (it is
> object-scoped, private, final,
> non-volatile, direct class access -- no interfaces). See listing (A)
> below. My understanding of the JMM was that
> in cases such as this one, the compiler can safely assume no side
> effects for the current thread and move field references to registers.
> This is exactly what happens in case (2) (see listing (B) below) --
> the references to
> local variables are moved to regiters, with additional loop unrolling
> performed by the compiler.
>
> My question is if there is anything that prevents the compiler from
> caching the "list.buffer" pointer in
> version (1) of the loop? Note that get and size methods are inlined
> properly, but the field access is still
> performed on every loop iteration.
>
> I'd appreciate a hotspot code pointer where this situation is
> considered or a JVM reference pointer
> which makes the compiler behave as it does. Apologies in advance if
> I'm missing something
> naively simple.
>
> Dawid
>
>
> (A) simple get loop, pseudo-assembly dump (after warmup).
>
> 030   B4: #     B11 B5 &lt;- B3 B8      Loop: B4-B8 inner  Freq: 220753
> 030     MOV    EDX,[EBP + #12] ! Field .buffer
> 033     NullCheck EBP
> 033
> 033   B5: #     B12 B6 &lt;- B4  Freq: 220753
> 033     MOV    EBP,[EDX + #8]
> 036     NullCheck EDX
> 036
> 036   B6: #     B10 B7 &lt;- B5  Freq: 220752
> 036     CMPu   EDI,EBP
> 038     Jnb,us B10  P=0.000001 C=-1.000000
> 038
> 03a   B7: #     B13 B8 &lt;- B6  Freq: 220752
> 03a     MOVSX8 EAX,[EDX + #12 + EDI]    # byte
> 03f     MEMBAR-release ! (empty encoding)
> 03f     MOV8   [ECX + precise klass Benchmark:
> 0x048e80e8:Constant:exact *],EAX ! Field  Volatile value
> 045     MEMBAR-volatile (unnecessary so empty encoding)
> 045     LOCK ADDL [ESP + #0], 0 ! membar_volatile
> 04a     MOV    EBP,[EBX + #12] ! Field .list
> 04d     MOV    EDX,[EBP + #8]   # int ! Field .elementsCount
> 050     NullCheck EBP
> 050
> 050   B8: #     B4 B9 &lt;- B7  Freq: 220752
> 050     INC    EDI
> 051     TSTL   #polladdr,EAX    ! Safepoint: poll for GC
> 057     CMP    EDI,EDX
> 059     Jl,s  B4  P=1.000000 C=179200.000000
>
>
> (B) direct buffer access loop (after warmup).
>
> 01a     MOV    ECX,[ECX + #12] ! Field .list
> 01d     MOV    EAX,[ECX + #8]   # int ! Field .elementsCount
> 020     NullCheck ECX
> 020
> 020   B2: #     B12 B3 &lt;- B1  Freq: 0.999999
> 020     TEST   EAX,EAX
> 022     Jle    B12  P=0.000000 C=1.000000
> 022
> 028   B3: #     B4 &lt;- B2  Freq: 0.999999
> 028     MOV    EDI,[ECX + #12] ! Field .buffer
> 02b     XOR    EBX,EBX
> 02d     MOV    EBP,#360
> 02d
> 032   B4: #     B14 B5 &lt;- B3 B6      Loop: B4-B6 inner stride: not constant
> pre of N158 Freq: 1.99999
> 032     MOV    EDX,[EDI + #8]
> 035     NullCheck EDI
> 035
> 035   B5: #     B13 B6 &lt;- B4  Freq: 1.99999
> 035     CMPu   EBX,EDX
> 037     Jnb,u  B13  P=0.000001 C=-1.000000
> 037
> 03d   B6: #     B4 B7 &lt;- B5  Freq: 1.99999
> 03d     MOVSX8 ECX,[EDI + #12 + EBX]    # byte
> 042     MEMBAR-release ! (empty encoding)
> 042     MOV8   [EBP + precise klass : 0x04161d30:Constant:exact *],ECX
> ! Field  Volatile.value
> 048     MEMBAR-volatile (unnecessary so empty encoding)
> 048     LOCK ADDL [ESP + #0], 0 ! membar_volatile
> 04d     INC    EBX
> 04e     CMP    EBX,#1
> 051     Jl,s  B4        # Loop end  P=0.500000 C=334848.000000
>
> 053   B7: #     B9 B8 &lt;- B6  Freq: 0.999994
> 053     MOV    ESI,EAX
> 055     MIN    ESI,EDX
> 05b     SUB    ESI,EBX
> 05d     AND    ESI,#-4
> 060     ADD    ESI,EBX
> 062     CMP    EBX,ESI
> 064     Jge,s  B9  P=0.000001 C=-1.000000
>        NOP     # 10 bytes pad for loops and calls
>
> 070   B8: #     B8 B9 &lt;- B7 B8       Loop: B8-B8 inner stride: not constant
> main of N116 Freq: 999993
> 070     MOVSX8 ECX,[EDI + #12 + EBX]    # byte
> 075     MEMBAR-release ! (empty encoding)
> 075     MOV8   [EBP + precise klass : 0x04161d30:Constant:exact *],ECX
> ! Field  Volatile.value
> 07b     MEMBAR-volatile (unnecessary so empty encoding)
> 07b     MEMBAR-volatile (unnecessary so empty encoding)
> 07b     MOVSX8 ECX,[EDI + #13 + EBX]    # byte
> 080     MEMBAR-release ! (empty encoding)
> 080     MOV8   [EBP + precise klass : 0x04161d30:Constant:exact *],ECX
> ! Field  Volatile.value
> 086     MEMBAR-volatile (unnecessary so empty encoding)
> 086     MEMBAR-volatile (unnecessary so empty encoding)
>
> ... (loop further unrolled here) ...
>
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20100108/7831db39/attachment-0001.html 


More information about the hotspot-compiler-dev mailing list