RFC: Throughput barriers for G1
Erik Helin
erik.helin at oracle.com
Wed Nov 9 15:11:18 UTC 2016
Hi all,
one of the concerns with using G1 has been the throughput reductions due
to the costly (post-)barriers and refinement.
This idea proposes the use of the same write post-barrier for G1 as for
the other collectors, and disable concurrent refinement. This improves
throughput at the cost of predictability, as the concurrent refinement
needs to be performed in a GC pause like in other collectors.
Background:
The G1 write-barrier consists of two parts, the pre-write barrier and
the post-write barrier. For a mental model, the barrier (vastly
simplified) looks like the following in "pseudo-C++":
// Barrier for a write like: o.x = y;
// The pre-write barrier due to SATB
if (conc_mark_is_active) {
if (o.x != NULL) {
add_to_satb_queue(&o.x);
}
}
// The actual write
o.x = y;
// The post-write barrier to keep track of pointers between regions
if (region(o) != region(y)) {
if (y != NULL) {
if (card(o.x) != Young) {
StoreLoad();
if (card(o.x) != Dirty) {
card(o.x) = Dirty;
add_to_refinement_queue(card(o.x));
}
}
}
}
As far as we know (based on performance runs) the pre-write part of the
barrier is rarely a throughput problem (but we would of course
appreciate if others confirmed this). The problems with the post-write
barrier for throughput performance are two:
- the sheer size of the post-write barrier (the number of assembly
instructions)
- the logic of the post-write barrier (the branches and also the adding
a card to the refinement queue)
The responsibility of the post-write barrier is to queue up pointers
between regions so that the concurrent refinements threads can update
the remembered sets concurrently. If you were to give up this, an
alternative post-write barrier could look like:
if (y != NULL) {
if (card(o.x) == Clean) {
card(o.x) = Dirty;
}
}
The above post-write barrier will result in better throughput because
the barrier consists of fewer instructions, less branches and (in
particular) no enqueuing. The concurrent refinement threads will also
be turned off with this kind of post-write barrier, which will further
increase throughput.
However, this is a trade-off. The cards will now have to be refined
during a STW collection pause, which will increase the time of the
pause. For a certain kind of applications, this trade-off might be worth
it, especially if the heap size isn't too big (the size of the card
table scales with the heap size). G1 would still be able to
incrementally compact the heap in order to avoid Full GCs.
One may still add the cross-region check in the barrier to decrease the
number of cards to process in the GC pause.
One optimization to the refinement in the GC pause may be to delay
refinement for cards that do not contain references into the collection
set to after the pause.
An alternative or addition could be to work on decreasing the overhead
of the post-barrier by improving the compiler to decrease code size and
reconsider ideas to remove the StoreLoad like suggested earlier (see
e.g.
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2014-December/011666.html).
Thanks,
Erik
More information about the hotspot-gc-dev
mailing list