Performance under jdk8

Sun Feb 10 06:26:08 PST 2013

Okay, I've been able to test our stuff (both micro-benchmarks and a full application) under jdk8, and the performance Is looking good in both cases, though still slower than 7 on initial bootstrap.

What is still concerning is memory usage which has increased significantly with Lambda forms due to the java objects themselves, the generated classes, and the extra code cache needed for all that byte code.

Part of this is because we've always generated our arity adapting method handles at call sites rather than storing them on the function objects or method tables, but even moving those we still have to generate method handles at the call sites for invalidation if a method table is mutated at runtime, and build up GWT chains. On a large code base (well over 100K method call sites plus global accessors, literals and so forth) this does produce a lot of LambdaForms. It was ~800K and I've now knocked that down to under 700K by rejigging various bits of constant and global variable access. Still more areas to attack, and I think I'll want to completely rejig how we do closures internally because binding environments breaks the LF caching if you do it enough.We'd originally gone with binding environments in because of some of the issues with ClassDefNotFound, but now that's fixed I can revisit that whole area.

>From picking through the implementation I think I can get a handle on what will and won't cause the generation of a new LambdaForm, and hence a new bit of byte code to be generated, but what I haven't managed to pick apart yet is exactly what ends up having a reference to the anonymous classes and therefore whether they can ever actually be unloaded.

The main bits of refactoring I'm doing to our code base  to try and reduce the memory footprint under 7u14 and 8 are

  1.  Ensure that all code to be immediately executed (lots during the bootstrap) is done in a disposable ClassLoader so that the classes and call sites can be garbage collected afterwards.
  2.  Move adapter method handle creation from call sites to the function objects themselves. While doing so we might well revisit our call site invalidation strategy.
  3.  Compile methods into individual classes and load them as late as possible (I.e. On first call) as the instrumentation we put in place shows that only 1/3 of methods are ever being called in the application we've tested with.
  4.  Rework how we handle closures (this kind of goes along with 2, but will require some more work and probably a different decision based on whether the closure has been bound into a method table and so is likely to be there for the long term – at that point binding the environment in makes sense).
  5.  Ensure that literal and global access can reuse method handles as much as possible. Symbol literal instantiation was especially bad for breaking the LF caching.

The first of these is only really going to help if LambdaForms and their anonymous classes can be collected away (need to pick through the caching code to fully check this), and the third is going to be dependent on the memory overhead of classes in the new implementation (what's the overhead on a class containing a single static method and a constant pool?). The 2, 4 and 5 are just unadulterated win as far I can see, or at least they are so far.

Oh, and the trick I'd done for handling perform() (our equivalent of ruby's send) is going to need to some serious retooling, it has dropped in performance considerably with the move to LFs and is unlikely to play too nicely with the caching in the long term. Oh well.

Sorry I've been slow in giving feedback on this, other commitments have kept me from working on this as much as I would have liked.

Duncan.