Performance of locally copied members ?

David Holmes David.Holmes at Oracle.Com
Mon May 3 22:27:03 UTC 2010


I've forwarded this to hotspot-compiler-dev.

I know Doug introduced this for final fields because at the time the 
compiler was not optimizing their use, but I had thought that issue was 
long since resolved at least in C2. If C1 is lagging then we need to see 
that it catches up.

There should not be a need to code this way at the Java-level. (Note, as 
Martin says sometimes you must copy a field to a local for correctness - 
the field might change value but the current code must not see that - 
but that's not the case we're concerned with.)

Cheers,
David Holmes

Osvaldo Doederlein said the following on 05/04/10 06:13:
> 2010/5/3 Martin Buchholz <martinrb at google.com <mailto:martinrb at google.com>>
> 
>     It's a coding style made popular by Doug Lea.
>     It's an extreme optimization that probably isn't necessary;
>     you can expect the JIT to make the same optimizations.
> 
> 
> It certainly is necessary - unfortunately. Testing my 
> particle/octree-based 3D renderer without this manual optimization 
> (dumping FPS performance each 100 frames, begin at 10th score after 
> startup):
> 
> JDK 6u21-b03, Hotspot Client:
> 159.4896331738437fps
> 161.29032258064515fps
> 158.73015873015873fps
> 160.0fps
> 159.23566878980893fps
> 
> JDK 6u21-b03, Hotspot Server:
> 197.23865877712032fps
> 204.91803278688525fps
> 196.07843137254903fps
> 200.40080160320642fps
> 198.01980198019803fps
> 
> Now let's cache 8 instance variables into local variables (most final, a 
> couple non-final ones too):
> 
> JDK 6u21-b03, Hotspot Client:
> 169.4915254237288fps
> 172.1170395869191fps
> 168.63406408094434fps
> 168.0672268907563fps
> 170.64846416382252fps
> 
> JDK 6u21-b03, Hotspot Server:
> 197.62845849802372fps
> 200.40080160320642fps
> 196.8503937007874fps
> 199.6007984031936fps
> 203.2520325203252fps
> 
> So, the manual optimization makes no difference for Hotspot Server; but 
> hell it does for Client - 6% better performance in this test; and the 
> test is not only the complex, deeply nested rendering loops that use 
> those cacheable variables to read the input data and update the output 
> pixel and Z buffers - there's also other code that burns significant CPU 
> and doesn't use these variables, remarkably buffer filling and copying 
> steps. This means the speedup in the optimized code should be much 
> higher than 6%, I only reported / cared to measure the application's 
> global performance.
> 
> We'll need to deal with HotSpot Client for years to come, not to mention 
> smaller platforms (JavaME, JavaFX Mobile&TV) which JIT compilers are 
> even lesser than JavaSE's C1. Tuned bytecode is also faster to 
> interpret, which benefits warm-up time too. Please keep your dirty 
> purist hands off the API code that Doug and others micro-optimized; it 
> is necessary. :)
> 
> And my +1 to add the same opts to other perf-critical APIs. Even most 
> important for java.nio as under C1, it doesn't currently benefit from 
> intrinsic compilation of critical DirectBuffer methods.
> 
> A+
> Osvaldo
> 
>  
> 
>     (you can try to check the machine code yourself!)
>     Nevertheless, copying to locals produces the smallest
>     bytecode, and for low-level code it's nice to write code
>     that's a little closer to the machine.
> 
>     Also, optimizations of finals (can cache even across volatile
>     reads) could be better.  John Rose is working on that.
> 
>     For some algorithms in j.u.c,
>     copying to a local is necessary for correctness.
> 
>     Martin
> 
>     On Mon, May 3, 2010 at 04:40, Ulf Zibis <Ulf.Zibis at gmx.de
>     <mailto:Ulf.Zibis at gmx.de>> wrote:
>      > Hi,
>      >
>      > in class String I often see member variables copied to local
>     variables.
>      > In java.nio.Buffer I don't see that (e.g. for "position" in
>     nextPutIndex(int
>      > nb)).
>      > Now I'm wondering.
>      >
>      > From JMM (Java-Memory-Model) I learned, that jvm can hold
>     non-volatile
>      > variables in a cache for each thread, so e.g. even in CPU
>     register for few
>      > ones.
>      > From this knowing, I don't understand, why doing the local
>     caching manually
>      > in String (and many other classes), instead trusting on the JVM.
>      >
>      > Can anybody help me in understanding this ?
>      >
>      > -Ulf
>      >
>      >
>      >
> 
> 



More information about the core-libs-dev mailing list