GcInfo: longest pause duration?

Roman Kennke rkennke at redhat.com
Tue Oct 24 15:08:52 UTC 2017


Hi Paul,

> + hotspot-gc-dev.
>
> I’d say that hsperfdata is the right place to put detailed 
> collector-specific data, and JMX the place to put anything that might 
> be generalized.
>
> How far does anyone want to go with this? I took a look at Azul’s 
> GC-related MXBean docs
>
> http://docs.azul.com/zing/Zing_UserGuide/Zing_UserGuide/ZingMXBeans.htm
>
> http://docs.azul.com/zing/ZingMXBeans_javadoc/
>
> which has good ideas in it, e.g., PauseDetails and a data collection 
> interval concept. My proposed getMaxPauseDuration() seems to be the 
> same as the max of Azul’s 
> GarbageCollectorMXBean.getLastGCDetails.getCollectorMaxPauseDetails().
>
> A JEP might be in order. Opinions?
>
I like the idea of publishing GC pause info via JMX. Yes I think it 
makes sense to draft up a JEP about it.

Roman



> Thanks,
>
> Paul
>
> *From: *Jeremy Manson <jeremymanson at google.com>
> *Date: *Monday, October 23, 2017 at 11:50 AM
> *To: *"Hohensee, Paul" <hohensee at amazon.com>, Derek Thomson 
> <dthomson at google.com>, Wessam Hassanein <wessam at google.com>, Man Cao 
> <manc at google.com>
> *Cc: *"serviceability-dev at openjdk.java.net" 
> <serviceability-dev at openjdk.java.net>
> *Subject: *Re: GcInfo: longest pause duration?
>
> Yes, this information is useful.
>
> We've done a couple of things to get at it, which folks might find 
> interesting.
>
> 1) We extended the hsperfdata to include a large number of stats for 
> CMS.  Parsing hsperfdata is ridiculously easy, and hasn't changed in 
> years, so we just parse it directly and report the values to 
> monitoring tools.  I'm too lazy to retype what we export, so are the 
> relevant CMS variables from our code; what they do should be 
> relatively obvious:
>
>     _concurrent_mode_failure_count =
>         PerfDataManager::create_counter(NULL_NS, 
> "concurrent-mode-failure-count",
>                                         PerfData::U_Events, CHECK);
>     _concurrent_mode_failure_due_to_fragmentation_count =
>         PerfDataManager::create_counter(NULL_NS, 
> "concurrent-mode-failure-due-to-fragmentation-count",
>                                         PerfData::U_Events, CHECK);
>     _concurrent_mode_failure_time =
>         PerfDataManager::create_counter(NULL_NS, 
> "concurrent-mode-failure-time",
>                                         PerfData::U_Ticks, CHECK);
>     _cms_initial_mark_count =
>         PerfDataManager::create_counter(NULL_NS, 
> "*cms-initial-mark-count*",
>                                         PerfData::U_Events, CHECK);
>     _cms_initial_mark_time =
>         PerfDataManager::create_counter(NULL_NS, "cms-initial-mark-time",
>                                         PerfData::U_Ticks, CHECK);
>    _cms_remark_count =
>         PerfDataManager::create_counter(NULL_NS, "cms-remark-count",
>                                         PerfData::U_Events, CHECK);
>     _cms_remark_time =
>         PerfDataManager::create_counter(NULL_NS, "cms-remark-time",
>                                         PerfData::U_Ticks, CHECK);
>     _cms_full_gc_count =
>         PerfDataManager::create_counter(NULL_NS, "cms-full-gc-count",
>                                         PerfData::U_Events, CHECK);
>     _cms_full_gc_time =
>         PerfDataManager::create_counter(NULL_NS, "cms-full-gc-time",
>                                         PerfData::U_Ticks, CHECK);
>     _cms_compacting_full_gc_count =
>         PerfDataManager::create_counter(NULL_NS, 
> "cms-compacting-full-gc-count",
>                                         PerfData::U_Events, CHECK);
>     _cms_compacting_full_gc_time =
>         PerfDataManager::create_counter(NULL_NS, 
> "cms-compacting-full-gc-time",
>                                         PerfData::U_Ticks, CHECK);
>     _cms_noncompacting_full_gc_count =
>         PerfDataManager::create_counter(NULL_NS, 
> "cms-noncompacting-full-gc-count",
>                                         PerfData::U_Events, CHECK);
>     _cms_noncompacting_full_gc_time =
>         PerfDataManager::create_counter(NULL_NS, 
> "cms-noncompacting-full-gc-time",
>                                         PerfData::U_Ticks, CHECK);
>     _cms_conc_mark_count =
>         PerfDataManager::create_counter(NULL_NS, "cms-conc-mark-count",
>                                         PerfData::U_Events, CHECK);
>     _cms_conc_mark_time =
>         PerfDataManager::create_counter(NULL_NS, "cms-conc-mark-time",
>                                         PerfData::U_Ticks, CHECK);
>     _cms_conc_preclean_count =
>         PerfDataManager::create_counter(NULL_NS, 
> "cms-conc-preclean-count",
>                                         PerfData::U_Events, CHECK);
>     _cms_conc_preclean_time =
>         PerfDataManager::create_counter(NULL_NS, "cms-conc-preclean-time",
>                                         PerfData::U_Ticks, CHECK);
>     _cms_conc_abortable_preclean_count =
>         PerfDataManager::create_counter(NULL_NS, 
> "cms-conc-abortable-preclean-count",
>                                         PerfData::U_Events, CHECK);
>     _cms_conc_abortable_preclean_time =
>         PerfDataManager::create_counter(NULL_NS, 
> "cms-conc-abortable-preclean-time",
>                                         PerfData::U_Ticks, CHECK);
>     _cms_conc_sweep_count =
>         PerfDataManager::create_counter(NULL_NS, "cms-conc-sweep-count",
>                                         PerfData::U_Events, CHECK);
>     _cms_conc_sweep_time =
>         PerfDataManager::create_counter(NULL_NS, "cms-conc-sweep-time",
>                                         PerfData::U_Ticks, CHECK);
>     _cms_conc_reset_count =
>         PerfDataManager::create_counter(NULL_NS, "cms-conc-reset-count",
>                                         PerfData::U_Events, CHECK);
>     _cms_conc_reset_time =
>         PerfDataManager::create_counter(NULL_NS, "cms-conc-reset-time",
>                                         PerfData::U_Ticks, CHECK);
>     _cms_collection_count =
>         PerfDataManager::create_counter(NULL_NS, "cms-collection-count",
>                                         PerfData::U_Events, CHECK);
>     _cms_collection_time =
>         PerfDataManager::create_counter(NULL_NS, "cms-collection-time",
>                                         PerfData::U_Ticks, CHECK);
>     _cms_ref_proc_count =
>         PerfDataManager::create_counter(NULL_NS, "cms-ref-proc-count",
>                                         PerfData::U_Events, CHECK);
>     _cms_ref_proc_time =
>         PerfDataManager::create_counter(NULL_NS, "cms-ref-proc-time",
>                                         PerfData::U_Ticks, CHECK);
>     _yg_alloc_bytes =
>         PerfDataManager::create_variable(NULL_NS, 
> "jvm-gc-cms-young-gen-alloc-bytes",
>                                          PerfData::U_None, CHECK);
>     _yg_promo_bytes =
>         PerfDataManager::create_variable(NULL_NS, 
> "jvm-gc-cms-young-gen-promo-bytes",
>                                          PerfData::U_None, CHECK);
>     _og_direct_alloc_bytes =
>         PerfDataManager::create_variable(NULL_NS, 
> "jvm-gc-cms-old-gen-directalloc-bytes",
>                                          PerfData::U_None, CHECK);
>     _og_alloc_bytes =
>         PerfDataManager::create_variable(NULL_NS, 
> "jvm-gc-cms-old-gen-alloc-bytes",
>                                          PerfData::U_None, CHECK);
>     _total_alloc_bytes =
>         PerfDataManager::create_variable(NULL_NS, 
> "jvm-gc-cms-total-alloc-bytes",
>                                          PerfData::U_None, CHECK);
>
> 2) We also implemented an API built on JFR's handling of GC events.  
> We use it to export the GC history to monitoring tools in a format 
> that doesn't involve parsing logs that can change.
>
> (Prior to JFR, we had a bespoke implementation where we instrumented 
> everywhere a log happened, but we fortunately don't have to do that 
> anymore.)
>
> We'd love both of these things to be something that was more generally 
> usable.
>
> Jeremy
>
> On Mon, Oct 23, 2017 at 9:31 AM, Hohensee, Paul <hohensee at amazon.com 
> <mailto:hohensee at amazon.com>> wrote:
>
>     I’d like to solicit opinions on reporting GC pause duration
>     (stopped-world pause time) via JMX. This info would be useful in
>     figuring out whether or not GC pause times are factors in failing
>     to meet response time SLAs. The info is of course available
>     directly from GC logs, but parsing logs is fraught and JMX doesn’t
>     seem to report the equivalent info.
>
>     GcInfo
>
>     https://docs.oracle.com/javase/9/docs/api/com/sun/management/GcInfo.html
>
>     has a getDuration() method which works fine for the non-concurrent
>     collectors (since they’re STW), but for CMS and G1 it appears to
>     report the duration of an entire concurrent cycle, which isn’t
>     what I want. The number of STW pauses during a concurrent cycle
>     varies by collector, so ideally there would be a method that
>     reports cause (as a string) and duration for each STW pause. If
>     that’s too much, perhaps the minimum might be a
>     getMaxPauseDuration() method that reports the maximum pause
>     duration of all the STW pauses that happen during a concurrent cycle.
>
>     Relatedly, the full compacting GCs that happen as a result of CMS
>     and G1 concurrent mode failure aren’t reported separately from
>     concurrent cycles. It would be useful to differentiate these from
>     “ConcurrentMarkSweep” and “G1 Old Generation”. Perhaps add
>     collector types to CMS and G1, vis. “MarkSweepCompact” (which
>     already exists and is literally what’s executed by CMS) and a new
>     “G1 MarkSweepCompact” collector for G1.
>
>     If there’s a consensus that something should be done about either
>     of these issues, I’d be happy to file RFE(s) and do the work.
>
>     Thanks,
>
>     Paul
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20171024/35e66ed0/attachment.htm>


More information about the hotspot-gc-dev mailing list