Performance of locally copied members ?

Osvaldo Doederlein opinali at gmail.com
Mon May 3 20:13:44 UTC 2010


2010/5/3 Martin Buchholz <martinrb at google.com>

> It's a coding style made popular by Doug Lea.
> It's an extreme optimization that probably isn't necessary;
> you can expect the JIT to make the same optimizations.
>

It certainly is necessary - unfortunately. Testing my particle/octree-based
3D renderer without this manual optimization (dumping FPS performance each
100 frames, begin at 10th score after startup):

JDK 6u21-b03, Hotspot Client:
159.4896331738437fps
161.29032258064515fps
158.73015873015873fps
160.0fps
159.23566878980893fps

JDK 6u21-b03, Hotspot Server:
197.23865877712032fps
204.91803278688525fps
196.07843137254903fps
200.40080160320642fps
198.01980198019803fps

Now let's cache 8 instance variables into local variables (most final, a
couple non-final ones too):

JDK 6u21-b03, Hotspot Client:
169.4915254237288fps
172.1170395869191fps
168.63406408094434fps
168.0672268907563fps
170.64846416382252fps

JDK 6u21-b03, Hotspot Server:
197.62845849802372fps
200.40080160320642fps
196.8503937007874fps
199.6007984031936fps
203.2520325203252fps

So, the manual optimization makes no difference for Hotspot Server; but hell
it does for Client - 6% better performance in this test; and the test is not
only the complex, deeply nested rendering loops that use those cacheable
variables to read the input data and update the output pixel and Z buffers -
there's also other code that burns significant CPU and doesn't use these
variables, remarkably buffer filling and copying steps. This means the
speedup in the optimized code should be much higher than 6%, I only reported
/ cared to measure the application's global performance.

We'll need to deal with HotSpot Client for years to come, not to mention
smaller platforms (JavaME, JavaFX Mobile&TV) which JIT compilers are even
lesser than JavaSE's C1. Tuned bytecode is also faster to interpret, which
benefits warm-up time too. Please keep your dirty purist hands off the API
code that Doug and others micro-optimized; it is necessary. :)

And my +1 to add the same opts to other perf-critical APIs. Even most
important for java.nio as under C1, it doesn't currently benefit from
intrinsic compilation of critical DirectBuffer methods.

A+
Osvaldo



> (you can try to check the machine code yourself!)
> Nevertheless, copying to locals produces the smallest
> bytecode, and for low-level code it's nice to write code
> that's a little closer to the machine.
>
> Also, optimizations of finals (can cache even across volatile
> reads) could be better.  John Rose is working on that.
>
> For some algorithms in j.u.c,
> copying to a local is necessary for correctness.
>
> Martin
>
> On Mon, May 3, 2010 at 04:40, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
> > Hi,
> >
> > in class String I often see member variables copied to local variables.
> > In java.nio.Buffer I don't see that (e.g. for "position" in
> nextPutIndex(int
> > nb)).
> > Now I'm wondering.
> >
> > From JMM (Java-Memory-Model) I learned, that jvm can hold non-volatile
> > variables in a cache for each thread, so e.g. even in CPU register for
> few
> > ones.
> > From this knowing, I don't understand, why doing the local caching
> manually
> > in String (and many other classes), instead trusting on the JVM.
> >
> > Can anybody help me in understanding this ?
> >
> > -Ulf
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100503/ea5a5ea2/attachment.html>


More information about the core-libs-dev mailing list