RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events
Erik Österlund
erik.osterlund at oracle.com
Thu Mar 19 11:37:00 UTC 2020
Hi Stefan,
Nice! I like how you can now catch incorrect use of the API, without
making it hard to use.
Looks good. Thanks for sorting this out.
/Erik
On 2020-03-19 10:44, Stefan Karlsson wrote:
> Hi all,
>
> Please review this patch to rewrite the GCTimer, and associated
> classes, to not allow nested phases of different types (pause or
> concurrent).
>
> https://cr.openjdk.java.net/~stefank/8241160/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8241160
>
> A bug was found when I was looking at JFR events from ZGC. A
> GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The
> only valid parent is a GCPhasePause event. The reason why this
> happened was that the we use a GCTraceTime class inside the class
> unloading code. Previously, we only used GCTraceTimes inside pauses,
> but ever since class unloading was moved out to a concurrent phase,
> this isn't true anymore. GCTraceTime used
> GCTimer::register_gc_phase_start(name, Ticks, phase = <pause>), and
> therefore always reported pauses and pause sub-phases.
>
> With this patch, I suggest that we become stricter in our usages of
> the GCTimer. The effects of the patch are:
>
> 1) When a top-level pause (or concurrent) phase is created, the code
> must be explicit about what type of phase is created. The code will
> now assert if this is abused. Most places were already explicit, but I
> had to change two places:
>
> a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have
> access to register_gc_pause_start. I made that function public,
> instead of protected, so that we didn't have to deal with that problem.
>
> b) G1 used GCTraceTime to note the Remark/Cleanup pauses (in
> VM_G1Concurrent). This is the only place that uses GCTraceTime to
> start a pause. All other places use GCTraceTime to create sub-phases.
> I could have copy-n-pasted the entire
> GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and
> create a version that calls register_gc_pause_start instead of
> register_gc_phase_start. Instead of doing that I opted for creating a
> system where the code code register a set of callbacks to be called
> when the start and end time is registered. This is used in the backend
> of GCTraceTime, but then also used by G1 to allow us to not have to
> copy-n-paste a lot of the code.
>
> I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic
> to the default callbacks (unfied logging and GCTimer) but couldn't
> find a nice way to express that, because of the way we macro-expand
> the UL tags. Maybe something we can consider for a future investigation.
>
> 2) sub-phases now inherit the type from the parent phase, and there's
> no possibility to incorrectly nest phases anymore. This also removed
> the need for ConcurrentGCTimer::_is_concurrent_phase_active.
>
> 3) This allows (and encourages concurrent sub-phases). When the JFR
> events were ported to HotSpot, only pauses got sub-phases, because
> there wasn't a big need for concurrent sub-phases. In this patch I
> added level of sub-phases to JFR. Maybe it would be better to add more
> right away? (I'm not a fan of having the explicit sub-phase level
> events, instead of a counter in *the* phase event, but the JMC team at
> that time needed it to be logged as separate events. Maybe something
> that could be reconsidered some time)
>
> 4) The different consumers of the timestamps are separated into their
> own classes.
>
> 5) Shenandoah devs need to consider what to do about this change:
>
> - unloading_occurred = SystemDictionary::do_unloading(heap->gc_timer());
> + // FIXME: This turns off the previously broken JFR events. If we
> want to keep reporting them,
> + // but with the correct type (Concurrent) then a top-level
> concurrent phase is required.
> + unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer
> */);
>
> Where this code caused GCPhasePauseLevel1 events for ZGC, this used to
> create GCPhasePause events for Shenandoah. It uses GCTraceTime to log
> sub-phases, but the current Shenandoah code hasn't registered a
> top-level phase at this point. Either we keep this code with the
> removal of the gc_timer argument, or we add a top-level phase
> somewhere. If we want the latter, then I need suggestions on where to
> add them. Or maybe push the current code, and fix it as a follow-up
> patch?
>
> What do you think? An alternative is to (continue?) completely forbid
> concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes
> during concurrent phases. Even if we decide to do that, I think
> there's some merit to the stricter GCTimer code, and the slight
> separation of concern in GCTraceTime.
>
> Tested tier1-3
>
> Thanks,
> StefanK
More information about the hotspot-gc-dev
mailing list