Field access optimisations inside loops (question).
Dawid Weiss
dawid.weiss at gmail.com
Thu Jan 7 23:52:02 PST 2010
Hi everyone,
I have been wondering about one thing and failed to find the answer so far.
I observed the following behavior from the HotSpot compiler. Consider
the following two loops:
1.
for (int i = 0; i < list.size(); i++)
{
value = list.get(i);
}
2.
final int size = list.size();
final int [] buffer = list.buffer;
for (int i = 0; i < size; i++)
{
value = buffer[i];
}
The list variable's class is simple and the implementation of size()
and get() is trivial, basically:
get == return buffer[index];
size == return elementsCount;
(no boundary checks, inheritance or anything like this).
What's interesting is that loop (2) is at least TWICE as fast as loop
(1) (on various CPU architectures, multi and single-core systems). The "value"
variable is a public static volatile to force the compiler to actually
read the buffer's content and store is somewhere. Example timings:
testSimpleGetLoop : time.bench: 1.97, round: 0.20 [+- 0.00],
round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
testDirectBufferLoop : time.bench: 0.57, round: 0.06 [+- 0.01],
round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00
(note the "round" field above, in brackets are standard deviations
from multiple test runs).
I looked at the assembly dumped by the HotSpot and from it I can
conclude that in case of loop (1) the generated code always attempts
to re-read the fields referenced through the list field (it is
object-scoped, private, final,
non-volatile, direct class access -- no interfaces). See listing (A)
below. My understanding of the JMM was that
in cases such as this one, the compiler can safely assume no side
effects for the current thread and move field references to registers.
This is exactly what happens in case (2) (see listing (B) below) --
the references to
local variables are moved to regiters, with additional loop unrolling
performed by the compiler.
My question is if there is anything that prevents the compiler from
caching the "list.buffer" pointer in
version (1) of the loop? Note that get and size methods are inlined
properly, but the field access is still
performed on every loop iteration.
I'd appreciate a hotspot code pointer where this situation is
considered or a JVM reference pointer
which makes the compiler behave as it does. Apologies in advance if
I'm missing something
naively simple.
Dawid
(A) simple get loop, pseudo-assembly dump (after warmup).
030 B4: # B11 B5 <- B3 B8 Loop: B4-B8 inner Freq: 220753
030 MOV EDX,[EBP + #12] ! Field .buffer
033 NullCheck EBP
033
033 B5: # B12 B6 <- B4 Freq: 220753
033 MOV EBP,[EDX + #8]
036 NullCheck EDX
036
036 B6: # B10 B7 <- B5 Freq: 220752
036 CMPu EDI,EBP
038 Jnb,us B10 P=0.000001 C=-1.000000
038
03a B7: # B13 B8 <- B6 Freq: 220752
03a MOVSX8 EAX,[EDX + #12 + EDI] # byte
03f MEMBAR-release ! (empty encoding)
03f MOV8 [ECX + precise klass Benchmark:
0x048e80e8:Constant:exact *],EAX ! Field Volatile value
045 MEMBAR-volatile (unnecessary so empty encoding)
045 LOCK ADDL [ESP + #0], 0 ! membar_volatile
04a MOV EBP,[EBX + #12] ! Field .list
04d MOV EDX,[EBP + #8] # int ! Field .elementsCount
050 NullCheck EBP
050
050 B8: # B4 B9 <- B7 Freq: 220752
050 INC EDI
051 TSTL #polladdr,EAX ! Safepoint: poll for GC
057 CMP EDI,EDX
059 Jl,s B4 P=1.000000 C=179200.000000
(B) direct buffer access loop (after warmup).
01a MOV ECX,[ECX + #12] ! Field .list
01d MOV EAX,[ECX + #8] # int ! Field .elementsCount
020 NullCheck ECX
020
020 B2: # B12 B3 <- B1 Freq: 0.999999
020 TEST EAX,EAX
022 Jle B12 P=0.000000 C=1.000000
022
028 B3: # B4 <- B2 Freq: 0.999999
028 MOV EDI,[ECX + #12] ! Field .buffer
02b XOR EBX,EBX
02d MOV EBP,#360
02d
032 B4: # B14 B5 <- B3 B6 Loop: B4-B6 inner stride: not constant
pre of N158 Freq: 1.99999
032 MOV EDX,[EDI + #8]
035 NullCheck EDI
035
035 B5: # B13 B6 <- B4 Freq: 1.99999
035 CMPu EBX,EDX
037 Jnb,u B13 P=0.000001 C=-1.000000
037
03d B6: # B4 B7 <- B5 Freq: 1.99999
03d MOVSX8 ECX,[EDI + #12 + EBX] # byte
042 MEMBAR-release ! (empty encoding)
042 MOV8 [EBP + precise klass : 0x04161d30:Constant:exact *],ECX
! Field Volatile.value
048 MEMBAR-volatile (unnecessary so empty encoding)
048 LOCK ADDL [ESP + #0], 0 ! membar_volatile
04d INC EBX
04e CMP EBX,#1
051 Jl,s B4 # Loop end P=0.500000 C=334848.000000
053 B7: # B9 B8 <- B6 Freq: 0.999994
053 MOV ESI,EAX
055 MIN ESI,EDX
05b SUB ESI,EBX
05d AND ESI,#-4
060 ADD ESI,EBX
062 CMP EBX,ESI
064 Jge,s B9 P=0.000001 C=-1.000000
NOP # 10 bytes pad for loops and calls
070 B8: # B8 B9 <- B7 B8 Loop: B8-B8 inner stride: not constant
main of N116 Freq: 999993
070 MOVSX8 ECX,[EDI + #12 + EBX] # byte
075 MEMBAR-release ! (empty encoding)
075 MOV8 [EBP + precise klass : 0x04161d30:Constant:exact *],ECX
! Field Volatile.value
07b MEMBAR-volatile (unnecessary so empty encoding)
07b MEMBAR-volatile (unnecessary so empty encoding)
07b MOVSX8 ECX,[EDI + #13 + EBX] # byte
080 MEMBAR-release ! (empty encoding)
080 MOV8 [EBP + precise klass : 0x04161d30:Constant:exact *],ECX
! Field Volatile.value
086 MEMBAR-volatile (unnecessary so empty encoding)
086 MEMBAR-volatile (unnecessary so empty encoding)
... (loop further unrolled here) ...
More information about the hotspot-compiler-dev
mailing list