Metaspace Threshold nullifying warmup cycles and configuration questions

Mon Sep 30 07:32:27 UTC 2019

Thanks for the feedback. It indeed sounds like the early Metaspace GC
> cycles fool the heuristics here. Am I interpreting your last paragraph
> above correctly in that adjusting MetaspaceSize solves this part of the
> problem for you?
> If so, there are a few ways in which we could improve this on our side.
> For example, we might not want to do Metaspace GCs before warmup has
> completed, or only do Metaspace GCs when we fail to expand Metaspace.

Indeed, I checked tonight's runs and they get the three warm up cycles as I
expected. So the only prerequisite to fixing this issue was knowing how
much metaspace we typically fill, and set the initial size higher than this
amount.
The metaspace GCs were extremely quick, as they happened at the beginning
of the application's run, not much to collect, so in my use case I think I
would not do these cycles before warm up has been completed.

The sampling windows for the allocation rate is 1 second (with 10
> samples). So it can take up to 1 second before a phase shift is clearly
> reflected in the average. This is to avoid making the GC too nervous in
> case an spike in allocation rate is not persistent.
>
It depends a bit on what the situation looks like when you get the
> allocation stalls. If the GC is running back-to-back and you still get
> allocation stalls, then your only option is to increase the heap size
> and/or concurrent GC threads.

Alright thanks for the info. For now we are having GC cycles running back
to back. I'm still increasing concurrent threads to see how it goes, but
the goal was not to increase heap size when compared with our current G1
configuration. What happens if ConcGcThreads is equal to the vCPUs count?
Does it run as if it was parallel, or do the threads still share the CPUs
with the application threads ?
Or put in another way, if I set ConcGcThreads to 20 on a 20 vCPUs machine,
will I get something similar to a stop the world? Or if I have 200
application threads, will the GC only get 10% of the CPU time?

However, if the system is not doing GCs back-to-back and you still get
> allocation stalls, then is sounds more like a heuristics issue. If
> you're on JDK 13, a new option you could play with is
> -XX:SoftMaxHeapSize. Setting this to e.g. 75% of -Xmx will have the
> effect of GC starting to collect garbage earlier, increasing the safety
> margin to allocation stalls and making it more resilient to the
> heuristics issue. This option can be particularly useful in situations
> where the allocation rate fluctuates a lot, which can sometimes fool the
> heuristics.
>

I did read about this option and it seems like it would be a much better
way not to be tricked by allocation rate fluctuations, unfortunately we are
running with JDK 11 for now. I will try it on the side though.

Thanks a lot for your quick answer. I shall keep you updated.

Cheers,

Pierre Mével
pierre.mevel at activeviam.com
ActiveViam - Stagiaire
46 rue de l'arbre sec, 75001 Paris