Brief ZGC Overview (from digging up the source code)

Thu Dec 6 18:45:10 UTC 2018

Hi,
I've been digging up the source code of ZGC to get some understanding
of how it works. I've summarized how I think it works, and would
appreciate any feedback or corrections. I'm sure i missed a lot of
details (e.g. I don't talk about relocation at all), but I tried to
make my description as clear as possible. Here it is:

There are four threads that are involved in the main Garbage
Collection (GC) tasks: ZDirector, ZDriver, VMThread, and ZWorker(s).

The ZDirector thread continuously runs and checks for “causes” to run
the GC. Here, I’m only concerned about the “allocation rate” cause.
ZDirector calculates the object allocation rate and once the rate is
above a specified value it will trigger a GC cycle by signalling the
ZDriver thread which is waiting to be signalled. The ZDirector is one
path that can start the GC cycle. The other path is through mutator
threads when they allocate new objects through the “new” bytecode. A
GC cycle might be started in the case where there isn’t any available
heap space. The ZDriver is the thread responsible for going through
the whole GC cycle step by step. For some stages of GC, such as Mark
Start and Mark End, mutator threads have to come to a safepoint.
ZDriver accomplishes this by signalling the VMthread. When the
VMThread runs, a call is made to ensure that all mutators come to a
stop. The VMthread then performs the “stop-the-world” necessary
operations for these stages of GC. Once this is done, ZDriver will
enter the main concurrent marking cycle. The concurrent GC work is
delegated to worker threads (ZWorker). The number of worker threads is
specified at the the launch of the Java command.

These worker threads run concurrently with mutators. The heart of the
work is in ZMark:drain() where the worker threads will pop object
entries from their local stacks and mark_and_follow() these entries.
Work will continue until there are no more entries to pop. Workers pop
entries according to the “stripe” that these entries belong to. ZGC
uses a total number of stripes that is equal to the specified number
of workers up to a maximum number of 16 stripes. If there are more
than 16 workers, some workers will work on shared stripes. For
instance, in the case that we have 16 worker threads there will be 16
stripes and each thread will pop entries in its corresponding stripe.
So, thread 0 will pop entries from its local stack for stripe 0,
thread 1 from stripe 1 and so on. Stripe numbers merely represent
different segments in the heap. In ZGC, objects belong to ZPages that
can either be small, medium, or large. For example, small ZPages are
2MB in size, so to know which stripe an object belongs to, the object
address is right-shifted 21 bits to get the ZPage starting address (21
bits is 2MB). Then, the shifted address is anded with the total number
of stripes to get the stripe number that an entry corresponds to. The
main purpose of having stripes is to reduce the probability that two
workers have to mark the same objects which can be costly since the
actual final step in marking is done atomically as will be explained
next.

Popping is done on stacks that correspond to stripes. So, there is a
maximum of 16 stacks corresponding to the maximum of 16 stripes. Each
stack is a placeholder for a maximum of 254 entries (it has 254
“slots”). Once the slots are full (entries are pushed to stacks during
following objects in the marking phase, so slots may become full), the
stack is added to a list of “published” stacks. These published stacks
might be used later if a worker thread has no entries to follow in its
local stack and needs to steal entries from other stacks.

Once an object entry is popped, mark_and_follow() is called. This
function contains the bulk of the concurrent marking phase. It is
essentially divided into two parts: try_mark_object(), and
follow_object(). In try_mark_object(), the object will be marked. This
is done by obtaining the ZPage that the object belongs to, then the
“index” of this object in the Zpage is calculated. Finally, the object
is marked by setting a bit in a “bitmap” inside a “livemap” which
contains liveness information about all objects in the object graph.
The livemap is essentially the object graph with liveness information
. Of course, the setting of the bit is done atomically since it is
possible that the bit will be set from another worker thread that
might have followed its object path and reached the same object.

The second half of marking is done in follow_object(), where the
recently marked object will be followed to add all of its reference
objects into the corresponding marking stacks. Each Java instance
class object contains a map (or a number of maps) of reference objects
(these are called “oops”; an oop (Ordinary Object Pointer) is a
representation of the object itself in addition to metadata and
functions to manipulate object address bits). These maps contain a
number of oops (depending on how many reference objects the instance
class contains) that are iterated and pushed onto their
stripe-corresponding marking stack where they will be marked later in
the try_mark_object(). The whole process of mark_and_follow() is
repeated until all of the object graph is traversed. Once this is
complete, all of the worker stacks will be flushed and prepared for
the next GC cycle.

Thanks,
Mohammad