Dynamic G1 barrier elision for C2 in young

Sun Jun 14 00:53:59 UTC 2015

Hi Vitaly,

Thank you for liking the idea! What I had in mind was optimizing young objects because it’s easy to reason about consistency, but I suppose in such cases even old objects could be optimized. But that would require more care and I don’t know how common such applications are as Java is widely marketed as having cheap allocations.

Anyway, the approach could be refined much more if plugged into an engine for points-to analysis. It gives a set of possible allocation points for object references, which could in our case be exploited when emitting reference store code. If objects are then given different Klass pointers (soft handles) for different allocation points at runtime, then the approach could find more opportunities for barrier elision. In cases where the same class is used in different parts of code, sometimes as temporary objects and sometimes as old objects, the difference would be identified and barriers would be elided for the code dealing with the always temporary variants of the class, but not for code dealing with the potentially old objects of the same class. Not sure how low hanging that fruit is though… but C2 does some of that stuff already for lock elision using escape analysis I think, although not intra-procedural (yet) if I get it right. Could have a look when I have time if anyone is interested in this kind of stuff.

Thanks,
/Erik

> On 11 Jun 2015, at 06:26, Vitaly Davidovich <vitalyd at gmail.com> wrote:
> 
> Hi Erik,
> 
> I think this is a neat idea.  There's a class of java programs that don't trigger any GC while up (or very few GCs at specific points in time).  Typically these are the same types of apps that pool objects on free lists, and thus incur write barrier overhead for no actual gain.  FWIW, I'd welcome such elision but would also like to see it for the parallel collector :).
> 
> Thanks
> 
> sent from my phone
> 
> On Jun 8, 2015 6:00 AM, "Erik Österlund" <erik.osterlund at lnu.se <mailto:erik.osterlund at lnu.se>> wrote:
> Hi,
> 
> Since this concerns compiler too, I decided to CC hotspot-dev.
> 
> Thanks,
> /Erik
> 
> Den 06/06/15 02:44 skrev Erik Österlund <erik.osterlund at lnu.se <mailto:erik.osterlund at lnu.se>>:
> 
> >Hi guys,
> >
> >Making G1 run faster on GC-tuned applications that are designed to only
> >rarely spill objects into old, seems like an interesting and important
> >optimization goal at the moment.
> >
> >Today I tried an interesting experiment. I sample garbage during the
> >sweeping phase (phase 2) of System.gc() (G1MarkSweep) that stumbles
> >through garbage anyway, hoping to find classes with instances that are
> >used all the time, but /never/ make it into old. Then I deoptimize these
> >classes and recompile the relevant nmethods depending on the class to
> >elide the G1 write barriers (in C2). If the GC eventually needs to promote
> >any of these objects to old, I just deoptimize again and recompile with G1
> >barriers turned back on.
> >
> >On some DaCapo benchmarks, it payed off very well for a few benchmarks
> >that supposedly use many temporary objects:
> >fop: -9.2% time <- this one was brutal!!
> >xalan: -6.9% time
> >jython: -5.9% time
> >
> >Results were measured with 40 warmup iterations, and then computed the
> >average of the following 10 iterations, so 50 iterations in total. Class
> >unloading was turned off (using my own patch to make -Xnoclassgc work,
> >because it seems to be broken currently) and 512M heaps.
> >
> >
> >The G1 barriers are already optimized to be faster for young objects, but
> >if the GC finds out that certain types of objects /never/ get old, telling
> >the compiler so allows complete elision of both the pre and post barriers
> >from the code which is nice.
> >
> >Are we conceptually interested in such a solution, potentially accompanied
> >with a flag like -XX:+G1DynamicallyOptimizeYoung? Thought I¹d check if I
> >can get some feedback before going too far with this.
> >
> >Here is the code I used.
> >
> >Patch 1: -Xnoclassgc
> >http://cr.openjdk.java.net/~eosterlund/g1_experiments/noclassgc/webrev.00/ <http://cr.openjdk.java.net/~eosterlund/g1_experiments/noclassgc/webrev.00/>
> >
> >This just fixes an issue that -Xnoclassgc doesn¹t work properly using G1
> >(unfortunately I have yet to get the bug system work to report it...).
> >With this JVM flag, it should not do class unloading. I had to run my
> >experiments without class unloading because it killed the optimized
> >nmethods of the almost always dead objects I want to optimize in DaCapo,
> >because DaCapo does not retain their class loaders or something.
> >
> >Patch 2: Dynamic G1 barrier elision
> >http://cr.openjdk.java.net/~eosterlund/g1_experiments/dynamic_barrier_elis <http://cr.openjdk.java.net/~eosterlund/g1_experiments/dynamic_barrier_elis>
> >i
> >on/webrev.00/
> >
> >This is where the interesting stuff went if anyone is interested. This is
> >just a very basic prototype/concept to check if the approach seems
> >interesting to you guys. You probably want to add stuff like deoptimizing
> >less (only if there are fields to actually optimize/deoptimize - keep
> >track of that more accurately), and to sample garbage outside of
> >System.gc() - this was just a convenience for now, and being more accurate
> >with which class declared a field, not the canonical class, etc.
> >
> >Any comments are welcome.
> >
> >Thanks,
> >/Erik
> >
> >
>