Metaspace Threshold nullifying warmup cycles and configuration questions

Fri Sep 27 12:04:43 UTC 2019

Hi Pierre,

On 9/26/19 12:32 PM, Pierre Mevel wrote:
> Good Morning,
> 
> I have been trying out ZGC on our application for a few months now, and
> have a few questions. For the record, I work on an in-memory OLAP database,
> and we are very interested by ZGC's promises on TBs heaps.
> 
> The following observations are made on a E64_v3 azure server, 64vCPUs and
> 432 Gb of RAM. These are spread with 210Gb for the Heap, and 210Gb for the
> Off-Heap, which is almost entirely filled with the database content.
> ConcGcThreads is set higher than default to 20.
> 
> 
> First, we very quickly fill the Metaspace, which triggers the first three
> GCs (Metaspace GC Threshold cause) on the application. These three arrive
> very quickly, and unfortunately they also nullify the use of the warm-up
> cycles, and the proactive rule of 10%, 20% and 30% heap usage GCs.
> 
> As the application doesn't do much at first (distributed OLAP cubes are
> discovering each other), the next GC cycle is triggered by the proactive
> rule of 5 minutes. At this point, the GC cycle statistics give:
> [info ] GC(3)                         Mark Start               Mark End
>             Relocate Start          Relocate End                 High
>                  Low
> [info ] GC(3)  Capacity:   215040M (100%)     215040M (100%)     215040M
> (100%)        215040M (100%)     215040M (100%)     215040M (100%)
> [info ] GC(3)   Reserve:      110M (0%)          110M (0%)
>    110M (0%)               110M (0%)                110M (0%)
> 110M (0%)
> [info ] GC(3)      Free:       213860M (99%)        213854M (99%)
>   213850M (99%)        214778M (100%)      214778M (100%)      213742M (99%)
> 
> [info ] GC(3)      Used:     1070M (0%)             1076M (1%)
>   1080M (1%)             152M (0%)               1188M (1%)
> 152M (0%)
> 
> These cycles have a weird effect: MaxDurationOfGC is super low/rapide, and
> there is no warm-up cycles on the application.
> 
> After this very fast cycle (GC(3)), the application runs its course. The
> allocation rate average over 10 minutes (less time than that occured) was
> around 100Mb/s, and it bumps quite significantly afterward, but we hit
> again a proactive cycle after 5 minutes. That will be the last proactive
> one.
> 
> [info ] GC(4)                            Mark Start                 Mark
> End               Relocate Start            Relocate End
> High                          Low
> [info ] GC(4)  Capacity:      215040M (100%)      215040M (100%)
> 215040M (100%)        215040M (100%)       215040M (100%)     215040M
> (100%)
> [info ] GC(4)   Reserve:           110M (0%)               110M (0%)
>         110M (0%)                 110M (0%)                 110M (0%)
>         110M (0%)
> [info ] GC(4)      Free:          193266M (90%)       192580M (90%)
>   185320M (86%)         203730M (95%)         203734M (95%)       185180M
> (86%)
> [info ] GC(4)      Used:         21664M (10%)          22350M (10%)
> 29610M (14%)             11200M (5%)            29750M (14%)
> 11196M (5%)
> 
> The allocation rate average over 10s is now over 1000Mb/s, and
> MaxDurationOfGc is 16.8s. And now the issues arrive. Because the
> MaxDurationOfGc is "low", the calculated TimeUntilGC is artificially bumped
> up, and the next cycle happens like this:
> 
> [info ] GC(5)                          Mark Start                 Mark End
>                 Relocate Start             Relocate End
>   High                          Low
> [info ] GC(5)  Capacity:   215040M (100%)     215040M (100%)        215040M
> (100%)     215040M (100%)           215040M (100%)      215040M (100%)
> [info ] GC(5)   Reserve:      110M (0%)               110M (0%)
>        110M (0%)                 110M (0%)                  110M (0%)
>             32M (0%)
> [info ] GC(5)      Free:    45490M (21%)             39798M (19%)
>     0M (0%)               140128M (65%)           140152M (65%)
>   0M (0%)
> [info ] GC(5)      Used:   169440M (79%)          175132M (81%)
>   214930M (100%)       74802M (35%)            215008M (100%)
> 74778M (35%)
> 
> GC(5)'s cause is Allocation Rate, as will be the cause of every cycle after
> this. It hits [info ] GC(5) Relocation: Incomplete as well, bumps
> MaxDurationOfGc to 107s and gets Allocation Stalls across all threads
> during the Concurrent Relocate Phase.
> 
> When it finished, the next debug log line reads:
> [debug] Allocation Rate: 1300.000MB/s, Avg: 1178.000(+/-15.369)MB/s
> [debug] Rule: Allocation Rate, MaxAllocRate: 3348.971MB/s, Free: 140102MB,
> MaxDurationOfGC: 107.356s, TimeUntilGC: -65.622s
> .
> 
> I feel like these very long GC cycles at the beginning of the application's
> run are just a consequence of the Metaspace GC cyles running first.
> I don't really know how Metaspace quickly fills for other applications. I
> solved this by setting the -XX:MetaspaceSize flag to a little over what we
> use during a run, but I wanted to bring this to light, in case it is not a
> wanted behavior.

Thanks for the feedback. It indeed sounds like the early Metaspace GC 
cycles fool the heuristics here. Am I interpreting your last paragraph 
above correctly in that adjusting MetaspaceSize solves this part of the 
problem for you?

If so, there are a few ways in which we could improve this on our side. 
For example, we might not want to do Metaspace GCs before warmup has 
completed, or only do Metaspace GCs when we fail to expand Metaspace.

> 
> 
> During the run, on a particularly intensive workload, I can get lines such
> as:
> [debug] Allocation Rate: 15420.000MB/s, Avg: 3670.000(+/-1195.062)MB/s
> [debug] Rule: Allocation Rate, MaxAllocRate: 14208.385MB/s, Free: 81780MB,
> MaxDurationOfGC: 11.849s, TimeUntilGC: -6.193s
> 
> right after a GC, with the Allocation Rate that is superior to the
> MaxAllocRate. This is problematic, because the average allocation rate is
> artificially lowered by the Allocation Stalls that happened during the
> precedent cycle.

The sampling windows for the allocation rate is 1 second (with 10 
samples). So it can take up to 1 second before a phase shift is clearly 
reflected in the average. This is to avoid making the GC too nervous in 
case an spike in allocation rate is not persistent.

> 
> I am unsure how to configure the GC to get better results. I am currently
> trying to up both the amount of concurrent threads for ZGC and the
> ZAllocationSpikeTolerance, to trigger GC cycles earlier and have them
> complete faster, at the expense of application speed.
> 
> Is this a correct way to proceed?

It depends a bit on what the situation looks like when you get the 
allocation stalls. If the GC is running back-to-back and you still get 
allocation stalls, then your only option is to increase the heap size 
and/or concurrent GC threads.

However, if the system is not doing GCs back-to-back and you still get 
allocation stalls, then is sounds more like a heuristics issue. If 
you're on JDK 13, a new option you could play with is 
-XX:SoftMaxHeapSize. Setting this to e.g. 75% of -Xmx will have the 
effect of GC starting to collect garbage earlier, increasing the safety 
margin to allocation stalls and making it more resilient to the 
heuristics issue. This option can be particularly useful in situations 
where the allocation rate fluctuates a lot, which can sometimes fool the 
heuristics.

Increasing the ZAllocationSpikeTolerance will also give the GC a larger 
safety margin. However, this tolerance is relative to the current 
allocation rate, so it might not work that well if the allocation rate 
is fluctuating a lot.

cheers,
Per

> 
> 
> 
> Thank you very much in advance for your time and answers.
> Best Regards,
> 
> Pierre Mével
> pierre.mevel at activeviam.com
> ActiveViam
> 46 rue de l'arbre sec, 75001 Paris
>