Architectural Comparison with C4/Pauseless?
per.liden at oracle.com
Thu Dec 14 14:31:57 UTC 2017
On 2017-12-13 07:19, Gil Tene wrote:
> To Per and the rest of ZGC team: Congratulations on having the ZGC project and sources up! It is great to see another concurrent compacting collector being actively developed.
> As I start digging into some of the ZGC details, it seems (at least at first pass) that we are looking at a main mechanism that is very similar to C4 (2011 ISMM paper here: https://www.azul.com/files/c4_paper_acm.pdf), or the single-generation Pauseless collector that preceded it (2005 VEE/Usenix paper here: https://www.usenix.org/legacy/events/vee05/full_papers/p46-click.pdf). This suggests that much of the JVM infrastructure and related design needs will end up being similar as well, and we can both benefit from understanding those similarities and comparing notes.
> Can we go through a quick stab at mapping the mechanisms and terminologies?
> Some high level similarities I've noted so far:
> - The use of a barrier-at-refernce-load that determines whether or not an action is required purely based on the contents of the reference being loaded and some expected values for that contents (as opposed to considering data that would require de-referencing through the pointer): This seems equivalent to what the C4 LVB does [are we looking at the same actions? i.e.: queue to collector if not-yet-marked-through, fixup/remap to point to actual target address if points-to-relocated-object, and relocate object if points-to-needs-to-be-relocated-but-not-yet-relocated object?].
In ZGC we currently have the following main reasons why some action
needs to be taken:
1) During marking - "Points to an object that is not known to be
2) Between end of marking and start of relocation - "Points to a
3) Between end of marking and end of concurrent reference processing -
"Attempt to load weak/phantom oop pointing to an unmarked object".
4) During relocation - "Points to an object that is not known to not be
part of the collection set".
There are a number of different actions that can then follow, depending
on the above reason, the oop state, etc. For example:
- Mark an unmarked object as strongly-reachable
- Mark a final-reachable object as strongly-reachable
- Remap pointer and mark an unmarked object as strongly-reachable
- Remap pointer and mark a final-reachable object as strongly-reachable
- Resurrect an unmarked or finalizable-marked object pointed to by a
"weak" or "phantom" oop
- Prevent resurrection of an unmarked or finalizable-marked object
pointed to by a "weak" or "phantom" oop
- Remap pointer pointing to an object that is not part of the collection set
- Remap pointer pointing to an object that is part of the collection set
- Remap pointer and relocate an object that is part of the collection set
In ZGC terminology, a "remapped" pointer means it has the "remapped"
metadata bit set, which in turn means we know it doesn't point into the
collection set. "Finalizable-marked" means it's been marked via the
referent in a Finalizer objects.
> - Colored pointers in ZGC (which appear to encode metadata in the pointer): These seem similar to the concept of using metadata information in the reference in C4 and pauseless [A combination of NMT state and page numbers or ranges], including the use of similar triggering reasons (not-yet-marked-through, points-to-relocated-object, and points-to-obejct-that-needs-to-be-relocated] and their use in marking, compaction, and eventual fixup.
> [Do you see fundamental differences here that go beyond representation of the metadata in the pointer field? Are there any key differences in the triggering reasons or in how they are used to create the concurrent mark, relocate, and eventual fixup passes (or their equivalent terms in ZGC if there is a logical match)? ]
Given that C4 isn't available to study in detail, my understanding of
how it works is limited. I'm thinking that once you've had time to study
the ZGC source, you will be in a better position than me to call out the
> - Use of a common barrier test for both concurrent marking and concurrent compaction in ZGC: Same as C4 LVB. [Basically the "only leave the fast path if metadata shows that there is something to be done" test].
> - "In-place compaction" in ZGC: Same as C4's Quick Release. (Basically releases and recycles compacted-from pages[/regions] before fixing up references to objects in those pages, by maintaining forwarding information outside of the object body and page)
It's my understanding (correct me if I'm wrong) that C4's notion of
"Quick Release" means quick release/reclaim/reuse of physical memory
pages, but not quick release/reclaim/reuse of the virtual address space
those pages occupy. In ZGC, the physical memory pages and the address
space they occupy have the same life-cycle and both can be immediately
released/reclaimed/reused as a unit. From a "in-place compaction" point
of view, the end result is the same, except that ZGC doesn't need to
> These obviously lead me into a trap of assumptions (since I keep thinking in C4 terms).
> Question about expected invariants and reference fixup:
> - Do the colored pointers and the related read barrier provide invariants similar to those described in C4 (section 2.1)? Can similar assumptions be made about mutator visible references?
ZGC has a strong "to-space" invariant. What other invariants you get
depends on which barrier type was applied (we call them strong and weak
barriers, where weak maps to AS_NO_KEEPALIVE in the new Access API), and
what reference type is being accessed (strong/weak/phantom oop, which
maps to ON_STRONG/WEAK/PHANTOM/_OOP_REF). Please see zBarrier.* for more
> - Does ZGC fold a "fixup phase" (aka "remap phase" in C4) of references to compacted pages into the next Mark phase (delaying the complete fixup until then)? Or does it perform a separate fixup pass after relocation?
ZGC does "lazy fixup", in the sense that whoever loads an oop after
relocation will do the fixup. It could be a mutator or a GC worker doing
that. From an algorithm point of view, we can choose to fold or not to fold.
> - "Self healing": I didn't see mention of a C4/Pauseless "self healing" equivalent thus far in text, and have not followed the ZGC code far enough to determine if there is one. Does "ZGC" perform self-healing on references that were found to need attention by the barrier?
A load barrier in ZGC can heal/repair an oop, but if, when and how it
happens again depends on the barrier type used and the reference type
being accessed. Looking at the code in zBarrier.* should help if you're
interested in the details here. For example, a weak barrier never heals
the oop to the "marked" state, and a barrier on a weak/phantom oop never
heals if "resurrection" is blocked.
> Questions about pages/regions/boundaries/mapping:
> - Is ZGC "regional" In the sense that except for large objects (that span multiple dedicated regions) objects cannot span region boundaries? [This is the case in C4, Shenandoah, and G1], or does it handle the heap in some more fluid way?
ZGC is regional, in the sense that the heap is represented by a number
of ZPages, where each ZPage represents some contiguous chunk of memory.
Objects (large or small) never span across ZPage boundaries. A ZPage is
sized such that this never happens. If we, for example, allocate a 100M
object, then we create a single 100M ZPage for that. When that ZPage is
later reclaimed, we can decide to reuse that ZPage as is (e.g. if we
have a new request to allocate another 100M object), or we can throw
away that ZPage and just reuse the heap memory it represented to back
other ZPages with different size configurations.
> - Does ZGC use regions of fixed size ("ZPage"?) when those regions are not dedicated to single (large) object?
A ZPages can have any size (subject to alignment requirements). Some
ZPages only contain a single object (typically a very large object), and
some contain many objects (typically smaller objects). A ZPage also
belongs to a size group (currently we have three groups,
small/medium/large), where each group have different size and alignments
> - Does ZGC use virtual memory remapping to relocate "large" (larger than a single page) objects?
> - Does ZGC use virtual memory remapping on "normal" (non-large-object) regions as part of preparing or performing marking? Compaction?
Virtual memory remapping is not part of the marking or compaction
mechanism in ZGC. And we typically don't relocate large objects at all
when compacting. Large objects might in the future be relocated for some
other reason, like move to "cold-storage".
The only type of virtual memory mapping trick we do is mapping the heap
in multiple locations on platforms that don't have support for "Virtual
Address Masking/Tagging". x86 would be an example of such a platform. On
platforms which do support this (e.g. SPARC and Aarch64) we map the heap
in only one location. (Note that we don't have a Aarch64 port of ZGC at
this time, it's just used as an example here).
> Question about object lifecycle and pipeline:
> In gaining an understanding for pretty much any collector, I usually find that gaining an understanding for what happens to an object and references to it in two main (fairly simple) scenarios really helps. To help with that, can you describe the high level pipeline in these two simple cases [lets focus on "non-large" objects, assuming larger-than-one-page objects differ in some way (which may be a wrong assumption on my part)]
> - For a "stays alive through one collection object: From allocation (presumably in a TLAB), through preparation for marking and marking, and then through preparation for relocation and the relocation, and then through having references to the object fixed up.
> - For a "died quickly and gets allocation in the next collection" object: From allocation (presumably in a TLAB), [assume death here] through preparation for marking and marking, and then through preparation for relocation of other objects allocated in the same region, the relocation of those objects, and the eventual "release" [freeing/recycling/whatever] of the region that the dead object used to be in.
Quick example, which I think should answer for both scenarios above.
1) Global "expected metadata bits" is to "remapped state".
2) Object gets allocated in a TLAB. That TLAB is in turn allocated in
some suitable ZPage.
3) The returned reference inherits the current global "expected metadata
bits", i.e. "remapped state" in this example.
4) GC cycle starts. TLABS and ZPages containing TLAB are retired, making
them candidates for compaction. Global "expected metadata bits" is set
to "marked0" state.
5) During marking, either a mutator or a GC worker stumbles on the
reference, detecting that its state (remapped state) doesn't match the
global expected state (marked0). Adds the reference to queue of
references to mark. If it's in the "marked1" state (previous mark state)
a check is made see if this object was part of the previous collection
set. If so, look up forwarding information and update the reference with
the new location. Adjusts the reference to have the "marked0" state.
6) Marking ends.
7) A collection set (a set of ZPages) is selected.
8) Relocation starts. Global "expected metadata bits" is set to
9) During relocation. GC workers walk through the collection set and
relocates objects. Freeing/releasing ZPages as they become empty, making
them immediately reusable. Installs relocation/forwarding information in
an off-heap table. If a mutator loads an oop, it will detect that it's
not in the "remapped" state. Checks to see if the object is part of the
collection set. Adjusts the reference to have the "remapped" state, and
if needed adjust so that it points to the new location (helps out
relocate the object if it's not already done).
10) GC cycle ends
> Anyway, the above is plenty (probably too much) for a single mailing list thread, so I'll stop there.
> — Gil.
More information about the zgc-dev