GcInfo: longest pause duration?
Roman Kennke
rkennke at redhat.com
Tue Oct 24 15:08:52 UTC 2017
Hi Paul,
> + hotspot-gc-dev.
>
> I’d say that hsperfdata is the right place to put detailed
> collector-specific data, and JMX the place to put anything that might
> be generalized.
>
> How far does anyone want to go with this? I took a look at Azul’s
> GC-related MXBean docs
>
> http://docs.azul.com/zing/Zing_UserGuide/Zing_UserGuide/ZingMXBeans.htm
>
> http://docs.azul.com/zing/ZingMXBeans_javadoc/
>
> which has good ideas in it, e.g., PauseDetails and a data collection
> interval concept. My proposed getMaxPauseDuration() seems to be the
> same as the max of Azul’s
> GarbageCollectorMXBean.getLastGCDetails.getCollectorMaxPauseDetails().
>
> A JEP might be in order. Opinions?
>
I like the idea of publishing GC pause info via JMX. Yes I think it
makes sense to draft up a JEP about it.
Roman
> Thanks,
>
> Paul
>
> *From: *Jeremy Manson <jeremymanson at google.com>
> *Date: *Monday, October 23, 2017 at 11:50 AM
> *To: *"Hohensee, Paul" <hohensee at amazon.com>, Derek Thomson
> <dthomson at google.com>, Wessam Hassanein <wessam at google.com>, Man Cao
> <manc at google.com>
> *Cc: *"serviceability-dev at openjdk.java.net"
> <serviceability-dev at openjdk.java.net>
> *Subject: *Re: GcInfo: longest pause duration?
>
> Yes, this information is useful.
>
> We've done a couple of things to get at it, which folks might find
> interesting.
>
> 1) We extended the hsperfdata to include a large number of stats for
> CMS. Parsing hsperfdata is ridiculously easy, and hasn't changed in
> years, so we just parse it directly and report the values to
> monitoring tools. I'm too lazy to retype what we export, so are the
> relevant CMS variables from our code; what they do should be
> relatively obvious:
>
> _concurrent_mode_failure_count =
> PerfDataManager::create_counter(NULL_NS,
> "concurrent-mode-failure-count",
> PerfData::U_Events, CHECK);
> _concurrent_mode_failure_due_to_fragmentation_count =
> PerfDataManager::create_counter(NULL_NS,
> "concurrent-mode-failure-due-to-fragmentation-count",
> PerfData::U_Events, CHECK);
> _concurrent_mode_failure_time =
> PerfDataManager::create_counter(NULL_NS,
> "concurrent-mode-failure-time",
> PerfData::U_Ticks, CHECK);
> _cms_initial_mark_count =
> PerfDataManager::create_counter(NULL_NS,
> "*cms-initial-mark-count*",
> PerfData::U_Events, CHECK);
> _cms_initial_mark_time =
> PerfDataManager::create_counter(NULL_NS, "cms-initial-mark-time",
> PerfData::U_Ticks, CHECK);
> _cms_remark_count =
> PerfDataManager::create_counter(NULL_NS, "cms-remark-count",
> PerfData::U_Events, CHECK);
> _cms_remark_time =
> PerfDataManager::create_counter(NULL_NS, "cms-remark-time",
> PerfData::U_Ticks, CHECK);
> _cms_full_gc_count =
> PerfDataManager::create_counter(NULL_NS, "cms-full-gc-count",
> PerfData::U_Events, CHECK);
> _cms_full_gc_time =
> PerfDataManager::create_counter(NULL_NS, "cms-full-gc-time",
> PerfData::U_Ticks, CHECK);
> _cms_compacting_full_gc_count =
> PerfDataManager::create_counter(NULL_NS,
> "cms-compacting-full-gc-count",
> PerfData::U_Events, CHECK);
> _cms_compacting_full_gc_time =
> PerfDataManager::create_counter(NULL_NS,
> "cms-compacting-full-gc-time",
> PerfData::U_Ticks, CHECK);
> _cms_noncompacting_full_gc_count =
> PerfDataManager::create_counter(NULL_NS,
> "cms-noncompacting-full-gc-count",
> PerfData::U_Events, CHECK);
> _cms_noncompacting_full_gc_time =
> PerfDataManager::create_counter(NULL_NS,
> "cms-noncompacting-full-gc-time",
> PerfData::U_Ticks, CHECK);
> _cms_conc_mark_count =
> PerfDataManager::create_counter(NULL_NS, "cms-conc-mark-count",
> PerfData::U_Events, CHECK);
> _cms_conc_mark_time =
> PerfDataManager::create_counter(NULL_NS, "cms-conc-mark-time",
> PerfData::U_Ticks, CHECK);
> _cms_conc_preclean_count =
> PerfDataManager::create_counter(NULL_NS,
> "cms-conc-preclean-count",
> PerfData::U_Events, CHECK);
> _cms_conc_preclean_time =
> PerfDataManager::create_counter(NULL_NS, "cms-conc-preclean-time",
> PerfData::U_Ticks, CHECK);
> _cms_conc_abortable_preclean_count =
> PerfDataManager::create_counter(NULL_NS,
> "cms-conc-abortable-preclean-count",
> PerfData::U_Events, CHECK);
> _cms_conc_abortable_preclean_time =
> PerfDataManager::create_counter(NULL_NS,
> "cms-conc-abortable-preclean-time",
> PerfData::U_Ticks, CHECK);
> _cms_conc_sweep_count =
> PerfDataManager::create_counter(NULL_NS, "cms-conc-sweep-count",
> PerfData::U_Events, CHECK);
> _cms_conc_sweep_time =
> PerfDataManager::create_counter(NULL_NS, "cms-conc-sweep-time",
> PerfData::U_Ticks, CHECK);
> _cms_conc_reset_count =
> PerfDataManager::create_counter(NULL_NS, "cms-conc-reset-count",
> PerfData::U_Events, CHECK);
> _cms_conc_reset_time =
> PerfDataManager::create_counter(NULL_NS, "cms-conc-reset-time",
> PerfData::U_Ticks, CHECK);
> _cms_collection_count =
> PerfDataManager::create_counter(NULL_NS, "cms-collection-count",
> PerfData::U_Events, CHECK);
> _cms_collection_time =
> PerfDataManager::create_counter(NULL_NS, "cms-collection-time",
> PerfData::U_Ticks, CHECK);
> _cms_ref_proc_count =
> PerfDataManager::create_counter(NULL_NS, "cms-ref-proc-count",
> PerfData::U_Events, CHECK);
> _cms_ref_proc_time =
> PerfDataManager::create_counter(NULL_NS, "cms-ref-proc-time",
> PerfData::U_Ticks, CHECK);
> _yg_alloc_bytes =
> PerfDataManager::create_variable(NULL_NS,
> "jvm-gc-cms-young-gen-alloc-bytes",
> PerfData::U_None, CHECK);
> _yg_promo_bytes =
> PerfDataManager::create_variable(NULL_NS,
> "jvm-gc-cms-young-gen-promo-bytes",
> PerfData::U_None, CHECK);
> _og_direct_alloc_bytes =
> PerfDataManager::create_variable(NULL_NS,
> "jvm-gc-cms-old-gen-directalloc-bytes",
> PerfData::U_None, CHECK);
> _og_alloc_bytes =
> PerfDataManager::create_variable(NULL_NS,
> "jvm-gc-cms-old-gen-alloc-bytes",
> PerfData::U_None, CHECK);
> _total_alloc_bytes =
> PerfDataManager::create_variable(NULL_NS,
> "jvm-gc-cms-total-alloc-bytes",
> PerfData::U_None, CHECK);
>
> 2) We also implemented an API built on JFR's handling of GC events.
> We use it to export the GC history to monitoring tools in a format
> that doesn't involve parsing logs that can change.
>
> (Prior to JFR, we had a bespoke implementation where we instrumented
> everywhere a log happened, but we fortunately don't have to do that
> anymore.)
>
> We'd love both of these things to be something that was more generally
> usable.
>
> Jeremy
>
> On Mon, Oct 23, 2017 at 9:31 AM, Hohensee, Paul <hohensee at amazon.com
> <mailto:hohensee at amazon.com>> wrote:
>
> I’d like to solicit opinions on reporting GC pause duration
> (stopped-world pause time) via JMX. This info would be useful in
> figuring out whether or not GC pause times are factors in failing
> to meet response time SLAs. The info is of course available
> directly from GC logs, but parsing logs is fraught and JMX doesn’t
> seem to report the equivalent info.
>
> GcInfo
>
> https://docs.oracle.com/javase/9/docs/api/com/sun/management/GcInfo.html
>
> has a getDuration() method which works fine for the non-concurrent
> collectors (since they’re STW), but for CMS and G1 it appears to
> report the duration of an entire concurrent cycle, which isn’t
> what I want. The number of STW pauses during a concurrent cycle
> varies by collector, so ideally there would be a method that
> reports cause (as a string) and duration for each STW pause. If
> that’s too much, perhaps the minimum might be a
> getMaxPauseDuration() method that reports the maximum pause
> duration of all the STW pauses that happen during a concurrent cycle.
>
> Relatedly, the full compacting GCs that happen as a result of CMS
> and G1 concurrent mode failure aren’t reported separately from
> concurrent cycles. It would be useful to differentiate these from
> “ConcurrentMarkSweep” and “G1 Old Generation”. Perhaps add
> collector types to CMS and G1, vis. “MarkSweepCompact” (which
> already exists and is literally what’s executed by CMS) and a new
> “G1 MarkSweepCompact” collector for G1.
>
> If there’s a consensus that something should be done about either
> of these issues, I’d be happy to file RFE(s) and do the work.
>
> Thanks,
>
> Paul
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20171024/35e66ed0/attachment.htm>
More information about the hotspot-gc-dev
mailing list