Project Lambda: Java Language Specification draft

Fri Jan 29 05:15:56 PST 2010

Even a raw analysis of this asm code is interesting: the iterator version of
__test__ is ~48Kb, versus 40Kb for the indexed version. (Although sometimes
bigger compiled code is better - superior inlining wins, all else being
equal - it doesn't seem the case here.)

I think the creation of strings produces too much noise, so I removed
'count' and changed increment() to just update a public static long field
(accum *= value), in my experience this is always sufficient to prevent
dead-code optimization. I also created an outer 1000X loop in __test__() to
have a bigger number of iterations for more precise timing, without needing
a much bigger ArrayList (would make this a FSB benchmark), and as a bonus
makes loop optimizations harder. Results in nanoseconds-per-iteration:

JDK 5_22:
Indexed / Server: 4.39ns
Iterator / Server: 9.45ns (2.15 X Indexed / Server)
Indexed / Client: 13.40ns (3.05 X Indexed / Server)
Iterator / Client: 31.30ns (2.33 X Indexed / Client, 3.31 X Iterator /
Server)

JDK 6u18:
Indexed / Server: 4.49ns
Iterator / Server: 4.62ns (1.02 X Indexed / Server)
Indexed / Client: 11.60ns (2.58 X Indexed / Server)
Iterator / Client: 42ns (3.62 X Indexed / Client, 9.09 X Iterator / Server)

JDK 7-b81:
Indexed / Server:  4.30ns
Iterator / Server: 5.27ns (1.22 X Indexed / Server)
Indexed / Client:  6.66ns (1.54 X Indexed / Server)
Iterator / Client: 17.50ns (2.62 X Indexed / Client, 3.32 X Iterator /
Server)

JDK 7-b81 +XX:+DoEscapeAnalysis:
Indexed / Server:  4.30ns
Iterator / Server: 5.14ns (1.19 X Indexed / Server)

The numbers speak for themselves. Iterating an ArrayList without iterators
can provide massive gains in the weaker HotSpot Client VMs; even in Server,
we can observe significant gains. The 2% advantage for 6u18 seems tiny, but
because the iteration itself goes along with a lot of benchmark overhead,
ANY statistically-significant advantage is important - this is usually the
case for microbenchmarks of extremely simple operations.

The Escape Analysis / scalar replacement optimization of bleeding-edge JDK7
gives another nice speedup, but clearly not enough to remove all Iterator
overhead (probably due to the factors in Rémi's analysis), although this is
not definitive as the Iterator / Server test case appears to suffer from a
regression (scores are worse than 6u18).

A+
Osvaldo

2010/1/29 Rémi Forax <forax at univ-mlv.fr>

> Le 29/01/2010 06:16, Neal Gafter a écrit :
> > On Thu, Jan 28, 2010 at 4:05 PM, Rémi Forax<forax at univ-mlv.fr>  wrote:
> >> Le 28/01/2010 20:11, Neal Gafter a écrit :
> >>> By the way, looping through an ArrayList using indexing happens to be
> >>> faster than looping through using an iterator because the latter
> >>> requires two method calls per element, while the former requires only
> >>> one.  It's not hard to verify this experimentally.
> >> This not true if the code is hot.
> > Have you run experiments to back up that assertion?
>
> Yes,
> I had done a similar experiment one week ago when testing method handles.
> I've updated it this morning to remove method handle things.
> You can find the two codes and the generated assembler codes is here:
> http://cr.openjdk.java.net/~forax/assembly-loop/loop-assembly.zip<http://cr.openjdk.java.net/%7Eforax/assembly-loop/loop-assembly.zip>
>
> Look for a method named __test__  and takes the second one,
> the first one is generated before Hotspot decides to inline iterator's
> methods.
>
> The method increment do parsing/toString to avoid to be inlined,
> __test__ is not inlined too.
> I've also tried to use the Iterator of a LinkedList during the warm-up
> to avoid too easy code for a CHA analysis but it doesn't change the
> generated code.
>
> Rémi
>
>