RFR: Evac reserve: make sure GC has untouchable space to move the objects into

Aleksey Shipilev shade at redhat.com
Thu Aug 30 17:26:46 UTC 2018


http://cr.openjdk.java.net/~shade/shenandoah/evac-reserve/webrev.01/

We have been approaching this for a while now, and current adopters report seem to tell that we
finally need it done.

Problem: GC success is predicated on having enough space to evacuate the CSet into. There is a race
against application that tries to claim the free space during the normal operation. The heuristics
and pacing code try to control the available free space to have some space left for evacs, and it
_almost_ works. Until it does not, and the whole thing spirals down to OOM.

It seems more straightforward to reserve a part of heap for GC use only, which solves this problem
at its core. We have >80% of it already done in FreeSet: there is the code that reserves space for
"GC view". So, during FreeSet reconstruction, we can bite out reserved space for GC use. This means
we do not need to have the static set of regions for evac-reserve, instead we can designate some of
them during FreeSet rebuild. This has the option for adding flexibility in evac-reserve size, if we
need it later.

Then we need to hook it up properly to heuristics, pacer, etc.:
 a) FreeSet::{available,capacity} now only report metrics about application allocations, which makes
the rest of the code account for evac reserve automatically;
 b) Allocation notifications need to be only issues for application allocs, to make invariants match
because of (a). It also makes pacer more precise, because it is not tainted by GC allocs;
 c) Some heuristics, notably "aggressive" that are not looking back at available reserved space,
need to be allowed to overflow the evac reserve. Other heuristics should be content with the
reserved space, and overflow is the heuristics bug we want to catch and fix;
 d) Pacer-for-evac can now ignore CSet, it only need to know about application allocs;

This makes Shenandoah much more stable on LRUFragger with >90% LDS without any heuristics touchups.
For example, this is the stable GC log from that run, where it balances to evac 5G each cycle with
that much of allocations:

----- 8< ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----

Trigger: Free (4497M) is below minimum threshold (10240M)
Free: 4497M (141 regions), [...]
GC(152) Pause Init Mark 0.224ms
GC(152) Concurrent marking 92782M->93646M(102400M) 952.171ms
GC(152) Adaptive CSet Selection. Target Free: 10240M, Actual Free: 8640M, Max CSet: 4654M, Min
Garbage: 1600M
GC(152) Collectable Garbage: 1599M (12% of total), 4032M CSet, 176 CSet regions
GC(152) Immediate Garbage: 0M (0% of total), 0 regions
GC(152) Pause Final Mark 0.868ms
GC(152) Concurrent cleanup 93678M->93678M(102400M) 0.283ms
GC(152) Free: 3569M (112 regions), [...]
GC(152) Evacuation Reserve: 5119M (160 regions), Max regular: 32768K
GC(152) Concurrent evacuation 93678M->97966M(102400M) 163.549ms
GC(152) Pause Init Update Refs 0.079ms
GC(152) Concurrent update references 97966M->98286M(102400M) 268.687ms
GC(152) Pause Final Update Refs 0.170ms
GC(152) Concurrent cleanup 98286M->92654M(102400M) 29.123ms
Free: 4593M (144 regions), [...]
Evacuation Reserve: 5120M (160 regions), Max regular: 32768K

----- 8< ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----

Note, this also opens up the way to implement degen-evac and degen-traversal properly: once
application gets the Allocation Failure, we dive into Degen, and can complete it with some room to
actually do the work!

Testing: tier3_gc_shenandoah, eyeballing ad-hoc benchmarks, specjbb, specjvm (running)

Thanks,
-Aleksey



More information about the shenandoah-dev mailing list