Architectural Comparison with C4/Pauseless?

Thu Dec 14 14:31:57 UTC 2017

Hi Gil,

On 2017-12-13 07:19, Gil Tene wrote:
> To Per and the rest of ZGC team: Congratulations on having the ZGC project and sources up! It is great to see another concurrent compacting collector being actively developed.

Thanks!

>
> As I start digging into some of the ZGC details, it seems (at least at first pass) that we are looking at a main mechanism that is very similar to C4 (2011 ISMM paper here: https://www.azul.com/files/c4_paper_acm.pdf), or the single-generation Pauseless collector that preceded it (2005 VEE/Usenix paper here: https://www.usenix.org/legacy/events/vee05/full_papers/p46-click.pdf). This suggests that much of the JVM infrastructure and related design needs will end up being similar as well, and we can both benefit from understanding those similarities and comparing notes.
>
> Can we go through a quick stab at mapping the mechanisms and terminologies?
>
> Some high level similarities I've noted so far:
>
> - The use of a barrier-at-refernce-load that determines whether or not an action is required purely based on the contents of the reference being loaded and some expected values for that contents (as opposed to considering data that would require de-referencing through the pointer): This seems equivalent to what the C4 LVB does [are we looking at the same actions? i.e.: queue to collector if not-yet-marked-through, fixup/remap to point to actual target address if points-to-relocated-object, and relocate object if points-to-needs-to-be-relocated-but-not-yet-relocated object?].

In ZGC we currently have the following main reasons why some action 
needs to be taken:

1) During marking - "Points to an object that is not known to be 
strongly marked".

2) Between end of marking and start of relocation - "Points to a 
final-reachable object".

3) Between end of marking and end of concurrent reference processing - 
"Attempt to load weak/phantom oop pointing to an unmarked object".

4) During relocation - "Points to an object that is not known to not be 
part of the collection set".

There are a number of different actions that can then follow, depending 
on the above reason, the oop state, etc. For example:
- Mark an unmarked object as strongly-reachable
- Mark a final-reachable object as strongly-reachable
- Remap pointer and mark an unmarked object as strongly-reachable
- Remap pointer and mark a final-reachable object as strongly-reachable
- Resurrect an unmarked or finalizable-marked object pointed to by a 
"weak" or "phantom" oop
- Prevent resurrection of an unmarked or finalizable-marked object 
pointed to by a "weak" or "phantom" oop
- Remap pointer pointing to an object that is not part of the collection set
- Remap pointer pointing to an object that is part of the collection set
- Remap pointer and relocate an object that is part of the collection set

In ZGC terminology, a "remapped" pointer means it has the "remapped" 
metadata bit set, which in turn means we know it doesn't point into the 
collection set. "Finalizable-marked" means it's been marked via the 
referent in a Finalizer objects.

>
> - Colored pointers in ZGC (which appear to encode metadata in the pointer): These seem similar to the concept of using metadata information in the reference in C4 and pauseless [A combination of NMT state and page numbers or ranges], including the use of similar triggering reasons (not-yet-marked-through, points-to-relocated-object, and points-to-obejct-that-needs-to-be-relocated] and their use in marking, compaction, and eventual fixup.
>
> [Do you see fundamental differences here that go beyond representation of the metadata in the pointer field? Are there any key differences in the triggering reasons or in how they are used to create the concurrent mark, relocate, and eventual fixup passes (or their equivalent terms in ZGC if there is a logical match)? ]

Given that C4 isn't available to study in detail, my understanding of 
how it works is limited. I'm thinking that once you've had time to study 
the ZGC source, you will be in a better position than me to call out the 
differences.

>
> - Use of a common barrier test for both concurrent marking and concurrent compaction in ZGC: Same as C4 LVB. [Basically the "only leave the fast path if metadata shows that there is something to be done" test].
>
> - "In-place compaction" in ZGC: Same as C4's Quick Release. (Basically releases and recycles compacted-from pages[/regions] before fixing up references to objects in those pages, by maintaining forwarding information outside of the object body and page)

It's my understanding (correct me if I'm wrong) that C4's notion of 
"Quick Release" means quick release/reclaim/reuse of physical memory 
pages, but not quick release/reclaim/reuse of the virtual address space 
those pages occupy. In ZGC, the physical memory pages and the address 
space they occupy have the same life-cycle and both can be immediately 
released/reclaimed/reused as a unit. From a "in-place compaction" point 
of view, the end result is the same, except that ZGC doesn't need to 
remap memory.

>
> These obviously lead me into a trap of assumptions (since I keep thinking in C4 terms).
>
> Question about expected invariants and reference fixup:
>
> - Do the colored pointers and the related read barrier provide invariants similar to those described in C4 (section 2.1)? Can similar assumptions be made about mutator visible references?

ZGC has a strong "to-space" invariant. What other invariants you get 
depends on which barrier type was applied (we call them strong and weak 
barriers, where weak maps to AS_NO_KEEPALIVE in the new Access API), and 
what reference type is being accessed (strong/weak/phantom oop, which 
maps to ON_STRONG/WEAK/PHANTOM/_OOP_REF). Please see zBarrier.* for more 
details here.

>
> - Does ZGC fold a "fixup phase" (aka "remap phase" in C4) of references to compacted pages into the next Mark phase (delaying the complete fixup until then)? Or does it perform a separate fixup pass after relocation?

ZGC does "lazy fixup", in the sense that whoever loads an oop after 
relocation will do the fixup. It could be a mutator or a GC worker doing 
that. From an algorithm point of view, we can choose to fold or not to fold.

>
> - "Self healing": I didn't see mention of a C4/Pauseless "self healing" equivalent thus far in text, and have not followed the ZGC code far enough to determine if there is one. Does "ZGC" perform self-healing on references that were found to need attention by the barrier?

A load barrier in ZGC can heal/repair an oop, but if, when and how it 
happens again depends on the barrier type used and the reference type 
being accessed. Looking at the code in zBarrier.* should help if you're 
interested in the details here. For example, a weak barrier never heals 
the oop to the "marked" state, and a barrier on a weak/phantom oop never 
heals if "resurrection" is blocked.

>
> Questions about pages/regions/boundaries/mapping:
>
> - Is ZGC "regional" In the sense that except for large objects (that span multiple dedicated regions) objects cannot span region boundaries? [This is the case in C4, Shenandoah, and G1], or does it handle the heap in some more fluid way?

ZGC is regional, in the sense that the heap is represented by a number 
of ZPages, where each ZPage represents some contiguous chunk of memory. 
Objects (large or small) never span across ZPage boundaries. A ZPage is 
sized such that this never happens. If we, for example, allocate a 100M 
object, then we create a single 100M ZPage for that. When that ZPage is 
later reclaimed, we can decide to reuse that ZPage as is (e.g. if we 
have a new request to allocate another 100M object), or we can throw 
away that ZPage and just reuse the heap memory it represented to back 
other ZPages with different size configurations.

>
> - Does ZGC use regions of fixed size ("ZPage"?) when those regions are not dedicated to single (large) object?

A ZPages can have any size (subject to alignment requirements). Some 
ZPages only contain a single object (typically a very large object), and 
some contain many objects (typically smaller objects). A ZPage also 
belongs to a size group (currently we have three groups, 
small/medium/large), where each group have different size and alignments 
requirement.

>
> - Does ZGC use virtual memory remapping to relocate "large" (larger than a single page) objects?
>
> - Does ZGC use virtual memory remapping on "normal" (non-large-object) regions as part of preparing or performing marking? Compaction?

Virtual memory remapping is not part of the marking or compaction 
mechanism in ZGC. And we typically don't relocate large objects at all 
when compacting. Large objects might in the future be relocated for some 
other reason, like move to "cold-storage".

The only type of virtual memory mapping trick we do is mapping the heap 
in multiple locations on platforms that don't have support for "Virtual 
Address Masking/Tagging". x86 would be an example of such a platform. On 
platforms which do support this (e.g. SPARC and Aarch64) we map the heap 
in only one location. (Note that we don't have a Aarch64 port of ZGC at 
this time, it's just used as an example here).

>
> Question about object lifecycle and pipeline:
>
> In gaining an understanding for pretty much any collector, I usually find that gaining an understanding for what happens to an object and references to it in two main (fairly simple) scenarios really helps. To help with that, can you describe the high level pipeline in these two simple cases [lets focus on "non-large" objects, assuming larger-than-one-page objects differ in some way (which may be a wrong assumption on my part)]
>
> - For a "stays alive through one collection object: From allocation (presumably in a TLAB), through preparation for marking and marking, and then through preparation for relocation and the relocation, and then through having references to the object fixed up.
>
> - For a "died quickly and gets allocation in the next collection" object: From allocation (presumably in a TLAB), [assume death here] through preparation for marking and marking, and then through preparation for relocation of other objects allocated in the same region, the relocation of those objects, and the eventual "release" [freeing/recycling/whatever] of the region that the dead object used to be in.

Quick example, which I think should answer for both scenarios above.

1) Global "expected metadata bits" is to "remapped state".
2) Object gets allocated in a TLAB. That TLAB is in turn allocated in 
some suitable ZPage.
3) The returned reference inherits the current global "expected metadata 
bits", i.e. "remapped state" in this example.
4) GC cycle starts. TLABS and ZPages containing TLAB are retired, making 
them candidates for compaction. Global "expected metadata bits" is set 
to "marked0" state.
5) During marking, either a mutator or a GC worker stumbles on the 
reference, detecting that its state (remapped state) doesn't match the 
global expected state (marked0). Adds the reference to queue of 
references to mark. If it's in the "marked1" state (previous mark state) 
a check is made see if this object was part of the previous collection 
set. If so, look up forwarding information and update the reference with 
the new location. Adjusts the reference to have the "marked0" state.
6) Marking ends.
7) A collection set (a set of ZPages) is selected.
8) Relocation starts. Global "expected metadata bits" is set to 
"remapped" state.
9) During relocation. GC workers walk through the collection set and 
relocates objects. Freeing/releasing ZPages as they become empty, making 
them immediately reusable. Installs relocation/forwarding information in 
an off-heap table. If a mutator loads an oop, it will detect that it's 
not in the "remapped" state. Checks to see if the object is part of the 
collection set. Adjusts the reference to have the "remapped" state, and 
if needed adjust so that it points to the new location (helps out 
relocate the object if it's not already done).
10) GC cycle ends

cheers,
Per

>
> Anyway, the above is plenty (probably too much) for a single mailing list thread, so I'll stop there.
>
> — Gil.
>