Monitoring Java Safepoint Time in JDK16+
Carter Kozak
ckozak at ckozak.net
Wed Jun 16 22:26:46 UTC 2021
Thanks, Erik.
Is there any chance you could point me toward the documentation for the jdk.Safepoint* events? It's difficult to tell at a glance if the sum of the differences between the safepoint begin and end times is equivalent to hotspotRuntimeManagementBean.getTotalSafepointTime(). Does that include the sync time (which appears to have its own SafepointStateSynchronization event)? I don't understand what beginSafepointEvent.getDuration() means in this context, is sync time included only in the safepoint sync event, duration of the SafepointBegin event, or time between begin and end event end times?
In logging framework code we've found that converting between time objects and numeric values doesn't always (on some jres ever?) optimize away Instant allocations. The default JFR configuration appears to set time thresholds for safepoint events which I'd expect reduces overhead, but would prevent us from accumulating data when there are frequent, quick safepoints. Otherwise the cost is likely to be much greater than the old approach.
In a similar vein, is there any way for us to accumulate this data without dedicating an entire OS thread to the RecordingStream?
Sorry for the barrage of questions, I appreciate your help!
Carter Kozak
On Wed, Jun 16, 2021, at 15:44, Erik Gahlin wrote:
> It's possible to access safepoint information using JFR, for example:
>
> try (RecordingStream r = new RecordingStream()) {
> r.enable("jdk.SafepointBegin");
> r.enable("jdk.SafepointEnd");
> r.onEvent("jdk.SafepointBegin", e -> System.out.println("begin: " + e.getEndTime()));
> r.onEvent("jdk.SafepointEnd", e -> System.out.println("end: " + e.getEndTime()));
> r.start();
> }
>
> Erik
>
> On 2021-06-16, 21:12, "serviceability-dev on behalf of Carter Kozak" <serviceability-dev-retn at openjdk.java.net <mailto:serviceability-dev-retn%40openjdk.java.net> on behalf of ckozak at ckozak.net <mailto:ckozak%40ckozak.net>> wrote:
>
> As java 16 and beyond lock down access to internal components by default, it can be difficult to produce Prometheus-style metrics describing application safepoints. I’ve been monitoring these metrics so that I can be alerted when an application spends more than ~10% of time in safepoints for some duration, because it means that something has gone wrong: Most often GC spirals, however excessive thread dumps, deadlock checks, etc can contribute. This has been one of the most meaningful tools in my arsenal to detect general JVM badness, however there doesn’t seem to be a way to access the data in newer JREs without allowing access to internal components.
>
> Previously I was able to use something along these lines, which required legacy sun.management component access.
>
> sun.management.HotspotRuntimeMBean hotspotRuntimeManagementBean = sun.management.ManagementFactoryHelper.getHotspotRuntimeMBean();
> long totalSafepointTimeMillis = hotspotRuntimeManagementBean.getTotalSafepointTime();
>
> Before I get ahead of myself, I’d like to confirm that I haven’t missed a supported pathway to access safepoint time. If my read is correct and there’s no way to access this information from inside the running JVM, would it be a a reasonable addition to the public API?
>
> Thanks,
> Carter Kozak
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20210616/a44857a5/attachment-0001.htm>
More information about the serviceability-dev
mailing list