Metaspace Threshold nullifying warmup cycles and configuration questions

Mon Sep 30 12:25:38 UTC 2019

On 9/30/19 9:32 AM, Pierre Mevel wrote:
> Thanks for the feedback. It indeed sounds like the early Metaspace GC
>> cycles fool the heuristics here. Am I interpreting your last paragraph
>> above correctly in that adjusting MetaspaceSize solves this part of the
>> problem for you?
>> If so, there are a few ways in which we could improve this on our side.
>> For example, we might not want to do Metaspace GCs before warmup has
>> completed, or only do Metaspace GCs when we fail to expand Metaspace.
> 
> 
> Indeed, I checked tonight's runs and they get the three warm up cycles as I
> expected. So the only prerequisite to fixing this issue was knowing how
> much metaspace we typically fill, and set the initial size higher than this
> amount.
> The metaspace GCs were extremely quick, as they happened at the beginning
> of the application's run, not much to collect, so in my use case I think I
> would not do these cycles before warm up has been completed.

Ok, thanks for confirming.

> 
> The sampling windows for the allocation rate is 1 second (with 10
>> samples). So it can take up to 1 second before a phase shift is clearly
>> reflected in the average. This is to avoid making the GC too nervous in
>> case an spike in allocation rate is not persistent.
>>
> It depends a bit on what the situation looks like when you get the
>> allocation stalls. If the GC is running back-to-back and you still get
>> allocation stalls, then your only option is to increase the heap size
>> and/or concurrent GC threads.
> 
> 
> Alright thanks for the info. For now we are having GC cycles running back
> to back. I'm still increasing concurrent threads to see how it goes, but
> the goal was not to increase heap size when compared with our current G1
> configuration. What happens if ConcGcThreads is equal to the vCPUs count?
> Does it run as if it was parallel, or do the threads still share the CPUs
> with the application threads ?

The GC threads still share the CPUs with the application threads, and 
concurrent GC threads run with the same priority as application threads.

> Or put in another way, if I set ConcGcThreads to 20 on a 20 vCPUs machine,
> will I get something similar to a stop the world? 

It will not be similar to STW. The concurrent GC work will be 
interleaved with the application work at the OS thread scheduling level.

> Or if I have 200
> application threads, will the GC only get 10% of the CPU time?

You're at the mercy of the OS scheduler. Assuming fair scheduling, and 
assuming all application threads want to run all the time (i.e. they 
never block for I/O, or locks, etc), all threads will each get their 
share of the CPU.

Of course, when low latency is a priority, you typically want to have a 
system that is sized such that the "max application load" doesn't 
utilize more than ~70% of the CPU. That helps avoid OS scheduling 
latency artifacts, etc.

> 
> However, if the system is not doing GCs back-to-back and you still get
>> allocation stalls, then is sounds more like a heuristics issue. If
>> you're on JDK 13, a new option you could play with is
>> -XX:SoftMaxHeapSize. Setting this to e.g. 75% of -Xmx will have the
>> effect of GC starting to collect garbage earlier, increasing the safety
>> margin to allocation stalls and making it more resilient to the
>> heuristics issue. This option can be particularly useful in situations
>> where the allocation rate fluctuates a lot, which can sometimes fool the
>> heuristics.
>>
> 
> I did read about this option and it seems like it would be a much better
> way not to be tricked by allocation rate fluctuations, unfortunately we are
> running with JDK 11 for now. I will try it on the side though.
> 
> Thanks a lot for your quick answer. I shall keep you updated.

Ok, thanks!

cheers,
Per

> 
> Cheers,
> 
> Pierre Mével
> pierre.mevel at activeviam.com
> ActiveViam - Stagiaire
> 46 rue de l'arbre sec, 75001 Paris
>