RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events
Stefan Karlsson
stefan.karlsson at oracle.com
Thu Mar 19 09:44:39 UTC 2020
Hi all,
Please review this patch to rewrite the GCTimer, and associated classes,
to not allow nested phases of different types (pause or concurrent).
https://cr.openjdk.java.net/~stefank/8241160/webrev.01/
https://bugs.openjdk.java.net/browse/JDK-8241160
A bug was found when I was looking at JFR events from ZGC. A
GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The only
valid parent is a GCPhasePause event. The reason why this happened was
that the we use a GCTraceTime class inside the class unloading code.
Previously, we only used GCTraceTimes inside pauses, but ever since
class unloading was moved out to a concurrent phase, this isn't true
anymore. GCTraceTime used GCTimer::register_gc_phase_start(name, Ticks,
phase = <pause>), and therefore always reported pauses and pause
sub-phases.
With this patch, I suggest that we become stricter in our usages of the
GCTimer. The effects of the patch are:
1) When a top-level pause (or concurrent) phase is created, the code
must be explicit about what type of phase is created. The code will now
assert if this is abused. Most places were already explicit, but I had
to change two places:
a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have
access to register_gc_pause_start. I made that function public, instead
of protected, so that we didn't have to deal with that problem.
b) G1 used GCTraceTime to note the Remark/Cleanup pauses (in
VM_G1Concurrent). This is the only place that uses GCTraceTime to start
a pause. All other places use GCTraceTime to create sub-phases. I could
have copy-n-pasted the entire
GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and
create a version that calls register_gc_pause_start instead of
register_gc_phase_start. Instead of doing that I opted for creating a
system where the code code register a set of callbacks to be called when
the start and end time is registered. This is used in the backend of
GCTraceTime, but then also used by G1 to allow us to not have to
copy-n-paste a lot of the code.
I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic
to the default callbacks (unfied logging and GCTimer) but couldn't find
a nice way to express that, because of the way we macro-expand the UL
tags. Maybe something we can consider for a future investigation.
2) sub-phases now inherit the type from the parent phase, and there's no
possibility to incorrectly nest phases anymore. This also removed the
need for ConcurrentGCTimer::_is_concurrent_phase_active.
3) This allows (and encourages concurrent sub-phases). When the JFR
events were ported to HotSpot, only pauses got sub-phases, because there
wasn't a big need for concurrent sub-phases. In this patch I added level
of sub-phases to JFR. Maybe it would be better to add more right away?
(I'm not a fan of having the explicit sub-phase level events, instead of
a counter in *the* phase event, but the JMC team at that time needed it
to be logged as separate events. Maybe something that could be
reconsidered some time)
4) The different consumers of the timestamps are separated into their
own classes.
5) Shenandoah devs need to consider what to do about this change:
- unloading_occurred = SystemDictionary::do_unloading(heap->gc_timer());
+ // FIXME: This turns off the previously broken JFR events. If we want
to keep reporting them,
+ // but with the correct type (Concurrent) then a top-level concurrent
phase is required.
+ unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer */);
Where this code caused GCPhasePauseLevel1 events for ZGC, this used to
create GCPhasePause events for Shenandoah. It uses GCTraceTime to log
sub-phases, but the current Shenandoah code hasn't registered a
top-level phase at this point. Either we keep this code with the removal
of the gc_timer argument, or we add a top-level phase somewhere. If we
want the latter, then I need suggestions on where to add them. Or maybe
push the current code, and fix it as a follow-up patch?
What do you think? An alternative is to (continue?) completely forbid
concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes
during concurrent phases. Even if we decide to do that, I think there's
some merit to the stricter GCTimer code, and the slight separation of
concern in GCTraceTime.
Tested tier1-3
Thanks,
StefanK
More information about the hotspot-gc-dev
mailing list