RFR: Generational support for weak roots and references

Fri Jul 23 18:52:30 UTC 2021

On Thu, 22 Jul 2021 22:53:56 GMT, William Kemper <wkemper at openjdk.org> wrote:

> ### Summary
> The LRB for non-strong references is modified to permit resurrection of objects outside the generation being collected. In other words, resurrection is only blocked for unmarked objects in the generation being collected.
> 
> Each `ShenandoahGeneration` has its own reference processor instance. In some cases, a reference from the old generation may end up on the young generation discovered list if the reference points to a young referent (this would happen if the old reference is in the remembered set). However, young references that point to referents in the old generation are _not_ discovered. This has the effect of strongly marking the old generation referent. This also avoids the case of having young references on the old generation discovered list being evacuated/relocated while they wait for old generation reference processing (although we believe this case would be handled correctly by the existing update references code).

Yes of course, there are many scenarios to consider here. We did consider a solution that would have the collector process references only during a _young_ and _global_ collection, but our ultimate goal is to eliminate the _global_ collections from generational mode. 

The need to have a reference processor per generation comes from the requirement for all references to a referent to be cleared _atomically_ by the collector. If we have only one reference processor, it would necessarily be shared by the young and old collection cycles, however this would break the _atomically_ cleared requirement. Young collection cycles are allowed to interrupt old collections. A young cycle which interrupts old marking would not be allowed to process references because marking of old (discovering of all references) would not yet be complete[1]. So we have two separate discovered lists, separated by the generation of their _referents_. Since a young collection will only collect young _referents_, we put old and young _references_ alike on the same _young_ discovered list.

This also motivates the changes in the LRB - I probably did not explain that well in the summary. The LRB code is such that it blocks resurrection only when `concurrent_weak_root_in_progress`, with the intention that weak references/roots pointing to _unmarked_ objects are destined to be cleared and so can no longer be accessed (resurrected). With two generations, it now matters which one is being collected. The weak roots are off heap[2] and so themselves are not distinguishable by generation, however, they do hold references to objects in the heap. At the beginning of old marking, the mark bitmaps are cleared. If a young collection occurs _before_ the old marking is complete it will block resurrection while `concurrent_weak_root_in_progress` is true - however, the mark bitmap for the old generation is incomplete so it should _not_ block resurrection for old objects. In other words, this change in the LRB effectively gives us `old_concurrent_weak_root_in_progress` and `young_concurr
 ent_weak_root_in_progress` without having to introduce the complexity of additional states to the collector's state machine.

[1] As I think on this, we could consider a design that would have young cycles which interrupt old marking _defer_ reference processing to the old generation cycle, but we would need to be careful of dangling referents (that is, referents which were _not_ strongly reachable at the end of the young cycle would be collected, but the reference to them would not yet be cleared). It doesn't seem like this would be safe.
[2] This is certainly true for _some_ weak roots, but I'm not sure it's true for _all_ weak roots.

-------------

PR: https://git.openjdk.java.net/shenandoah/pull/53