Very long (nearly 10s, abnormal) "Choose CSet" time in a young collection

Thomas Schatzl thomas.schatzl at oracle.com
Thu May 9 17:47:29 UTC 2019


Hi,

On Thu, 2019-05-09 at 12:43 -0400, chao chu wrote:
> Hi,
> We got a weird issue in our prod environment that we observed a
> nearly 10s "Choose CSet" time in a young GC (it's GCLocker initiated
> though, not sure if this is relevant), while the related "start
> choosing cset..." and "finish choosing cset..." log lines happened in
> the same milli-sec, which just doesn't make much sense.
> 
> Looking at the related G1 code, all related happened in this
> `finalize_cset` function: 
> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot
> /file/c16cc0cdcce5/src/share/vm/gc_implementation/g1/g1CollectorPolic
> y.cpp#l1949
> 
> the reported value in the "Choose CSet" is simply the sum of these
> two:
>
[...]
>
> From your experience, what could be the causes for the stuck? any
> particular os/host level metrics we should check? (only saw this once
> and couldn't re-produce it). Many thanks in advance, any insights are
> welcome :)
> 
> 
> The sample gc log sections below (due to our firm's policy, I
> couldn't post
> the gc log directly to public, sorry about that):
> 
> 
>    [Other:   9698.8 ms]
>       [Choose CSet:   *9677.5 ms*]
>       [Ref Proc:   0.3 ms]
>       [Ref Enq:   0.0 ms]
>       [Free CSet:   0.3 ms]
>    [Eden: 18G(18G)->0B(352M) Survivors: 1232M->1280M Heap:
> 26.5G(32G)->8175.2M(32G)]
>  [Times: user=5 sys=0.09, real=10.55 secs]

             ^^--- there seems to be something cut here for user time
statistic

Since you mentioned that this has been a one-off event, and given your
heap size it is unlikely that this phase would take that long on its
own. The cause may be one of the known causes for such unexpected
pauses:

- swapping/memory issues on the machine, stalling the java process and
basically everything else at that time.

- the kernel trying to coalesce small pages into huge pages
"transparently".

- the VM trying to write out the log to disk during very high I/O
activity of another process on the same machine, blocking for a long
time.

On the other hand sys time seems very low in this case, so I may be
wrong.

I would start with something like vmstat (on Linux) to show i/o and
memory activity.

(Blatantly taken from 
https://docs.oracle.com/en/java/javase/11/gctuning/garbage-first-garbage-collector-tuning.html#GUID-8D9B2530-E370-4B8B-8ADD-A43674FC6658
 :) )

Thanks,
  Thomas




More information about the hotspot-dev mailing list