Very long (nearly 10s, abnormal) "Choose CSet" time in a young collection
chao chu
chuchao333 at gmail.com
Thu May 9 18:04:49 UTC 2019
Hi Thomas,
Thanks for your reply. The 'user' part is 5.00 for completeness :)
Yes, weeks ago we identified that we also got impacted by linux THP which
caused like 90+ seconds in sys time in a full GC. But this one looks
different as you mentioned, the sys time is very low.
I saw two things we want to understand here:
1. the 5s 'user' time was spent where?
2. the other 5s (real - user) might be possibly due to the host was heavy
loaded and the jvm got no CPU time during that?
Regards,
On Thu, May 9, 2019 at 1:48 PM Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:
> Hi,
>
> On Thu, 2019-05-09 at 12:43 -0400, chao chu wrote:
> > Hi,
> > We got a weird issue in our prod environment that we observed a
> > nearly 10s "Choose CSet" time in a young GC (it's GCLocker initiated
> > though, not sure if this is relevant), while the related "start
> > choosing cset..." and "finish choosing cset..." log lines happened in
> > the same milli-sec, which just doesn't make much sense.
> >
> > Looking at the related G1 code, all related happened in this
> > `finalize_cset` function:
> > http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot
> > /file/c16cc0cdcce5/src/share/vm/gc_implementation/g1/g1CollectorPolic
> > y.cpp#l1949
> >
> > the reported value in the "Choose CSet" is simply the sum of these
> > two:
> >
> [...]
> >
> > From your experience, what could be the causes for the stuck? any
> > particular os/host level metrics we should check? (only saw this once
> > and couldn't re-produce it). Many thanks in advance, any insights are
> > welcome :)
> >
> >
> > The sample gc log sections below (due to our firm's policy, I
> > couldn't post
> > the gc log directly to public, sorry about that):
> >
> >
> > [Other: 9698.8 ms]
> > [Choose CSet: *9677.5 ms*]
> > [Ref Proc: 0.3 ms]
> > [Ref Enq: 0.0 ms]
> > [Free CSet: 0.3 ms]
> > [Eden: 18G(18G)->0B(352M) Survivors: 1232M->1280M Heap:
> > 26.5G(32G)->8175.2M(32G)]
> > [Times: user=5 sys=0.09, real=10.55 secs]
>
> ^^--- there seems to be something cut here for user time
> statistic
>
> Since you mentioned that this has been a one-off event, and given your
> heap size it is unlikely that this phase would take that long on its
> own. The cause may be one of the known causes for such unexpected
> pauses:
>
> - swapping/memory issues on the machine, stalling the java process and
> basically everything else at that time.
>
> - the kernel trying to coalesce small pages into huge pages
> "transparently".
>
> - the VM trying to write out the log to disk during very high I/O
> activity of another process on the same machine, blocking for a long
> time.
>
> On the other hand sys time seems very low in this case, so I may be
> wrong.
>
> I would start with something like vmstat (on Linux) to show i/o and
> memory activity.
>
> (Blatantly taken from
>
> https://docs.oracle.com/en/java/javase/11/gctuning/garbage-first-garbage-collector-tuning.html#GUID-8D9B2530-E370-4B8B-8ADD-A43674FC6658
> :) )
>
> Thanks,
> Thomas
>
>
>
--
ChuChao
More information about the hotspot-dev
mailing list