Submitted JEP 189: Shenandoah: An Ultra-Low-Pause-Time Garbage Collector

Mon May 16 14:33:23 UTC 2016

Let me start by saying that I believe all GC algorithms have their place.
The serial collector is great if you are running many many small jvms on a
server and you want to limit resource consumption.

I can't honestly say that Shenandoah is better or worse than C4 for a
particular application.  The best way to answer that question is to 
implement them both in OpenJDK and do an apples to apples comparison. 

We never would have gotten G1 if we hadn't first implemented the
parallel collector and CMS.  If we limit ourselves by not experimenting
with new algorithms, how will OpenJDK evolve?

The real risk of Shenandoah is adding read barriers to OpenJDK, however
one of the very real benefits of Shenandoah is adding read barriers to 
OpenJDK.  Those barriers may then be used to support other algorithms 
including C4.

Christine

----- Original Message -----
> From: "Andrew Dinn" <adinn at redhat.com>
> To: "Vitaly Davidovich" <vitalyd at gmail.com>
> Cc: hotspot-gc-dev at openjdk.java.net
> Sent: Monday, May 16, 2016 4:46:37 AM
> Subject: Re: Submitted JEP 189: Shenandoah: An Ultra-Low-Pause-Time Garbage	Collector
> 
> On 14/05/16 14:55, Vitaly Davidovich wrote:
> > I'd also be interested in a comparison with Azul's C4 design.
> 
> That's an interesting question, Perhaps Christine/Roman and Gil Tene
> might want to comment on this. I have talked to Gil quite a bit about C4
> and I also provided some early input to the design of Shenandoah. So,
> I'll mention some of the background to the choice of Brooks pointers and
> how that choice leads to differences from Azul. However, they may well
> want to rectify this story with the experience from actually
> implementing the GCs.
> 
> One of things which led Red Hat to go the way of Brooks pointers and
> away from the sideband data that Azul use to track relocations is that
> we wanted only to do relocation in write barriers. Here is why.
> 
> Like Shenandoah, after scanning and determining which regions are most
> profitable to relocate, Azul's GC marks a victim region as in need of
> relocation and then starts moving objects. However, unlike Shenandoah it
> writes the new locations into a separate relocation table rather than
> using a pointer slot adjacent to the object. SO, Azul's reloc info is
> not stored in the region being reclaimed.
> 
> With both GCs a very fast test can be used to detect whether a pointer
> addresses a victim region and, if that test is negative, reads and
> writes just go straight through with almost no cost. With Azul's GC if a
> mutator thread traps i.e. identifies a load address as belonging to a
> victim region the mutator has to look in the relocation table for a new
> copy. If a copy already exists the mutator just uses it. If not then the
> mutator races with the GC to copy the relevant object to new space
> before using the copy.
> 
> The difference from Shenandoah is that with Azul's GC this copy must
> occur under both a read barrier and a write barrier -- i.e. both
> barriers can incur copy overheads -- a new copy must be established
> before data can either be read or written. With Shenandoah readers
> simply indirect through the Brook's pointer and do a read using whatever
> data is indirectly linked, either the old or new copy. A Brooks pointer
> always gets updated before any new values can be written to the copy.
> This is all that is needed to ensure the necessary invariant that a
> write to new space cannot be missed by a reader loading from old space
> (at least not in cases where such a mutator write before read race
> already exists). So, with Shenandoah only writers have to check whether
> the object is in new space and, potentially, incur copy overheads.
> 
> One advantage of Azul's GC is that reloc may end quicker -- that's a
> simple outcome from the fact that mutator threads may operate at more
> frequent points during execution to co-operatively implement the copying.
> 
> Another advantage is that, in some cases, the reloc can be
> self-repairing -- the relocating thread can overwrite the location
> containing the originating old space reference with the new space
> reference (whether it was in register, on stack or in an instance/class
> field) avoiding subsequent barrier traps. This can significantly lower
> the costs involved in having to check the sideband reloc table because
> it lowers the frequency of barrier traps.
> 
> A further advantage is that once all objects in a victim region are
> relocated Azul can immediately unmap the physical pages for the region
> and reuse them to map new heap regions. That's safe because mutators
> which encounter an old space reference will find the copy in the
> sideband table so will never indirect through the victim region.
> 
> Brilliant as this scheme is, the disadvantage of Azul's scheme is that
> it shifts potential overheads from writes to writes and reads -- a major
> change form the cost balance in most GCs. Since writes are generally far
> less frequent than reads this makes for some interesting potential
> consequences for the distribution of the costs. Of course, on average it
> amortizes the cost of GC much more broadly but the interesting question
> is what does it do for the worst case.
> 
> The danger is that copy-at-read can cause long delays to threads which
> are essentially read-only -- these are known as read storms. Imagine a
> mutator searching down a linked list of 1000 elements allocated in 1000
> consecutive cons cells. Each read might require copying the cons cell.
> Since the copy overhead will be significantly larger than testing a car
> (left link) and indirecting through a cdr (right link) this could mean
> that a mostly read only process such as a rewrite of a list element or a
> list append could take much longer than normal, potentially causing a
> significant hiccup in the execution time of the mutator.
> 
> So, let's contrast with Shenandoah. What's the cost of avoiding read storms.
> 
> Where Shenandoah has to allocate an indirect pointer word per object
> Azul only requires reloc data per num live objects in victim regions.
> That may save some significant amount of space. Although note that the
> reloc tables have to come out of somewhere and there needs to be enough
> space guaranteed to ensure that you can always create the reloc entries
> you need.
> 
> Where Shenandoah always has to indirect through the Brooks pointer at
> read Azul can sometimes remove repeated trap costs by repairing a
> reference in place during use. Note, however, that in most cases the
> Shenandoah indirection has a very low bound and usually costs little or
> nothing because the indirection mostly returns to the original object
> rather than a new version. In consequence, Shenandoah's read barrier
> costs ought not to add up to much. I don't know any details of what
> Azul's sideband data lookup cost is is but it would be hard to make it
> as low as the Shenandoah cost -- hence the importance of the in place
> repair.
> 
> Where Azul can re-use physical pages early Shenandoah has to wait for
> the end of the next mark to recycle victim regions -- by that point all
> references to victim regions will have been updated to the new copies so
> there is no danger of any thread trying to indirect through a Brooks
> pointer in the victim region. Essentially, all other factors being
> equal, this allows Azul to get away with less headroom than Shenandoah
> before it needs to start collecting because free space comes on stream
> earlier.
> 
> n.b. I have no doubt that this is only a partial account and that I have
> missed some important wrinkles -- on both Azul's and Sheanandoah's side.
> I'll be interested to hear what the implementors have to say.
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
>