RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events

Erik Österlund erik.osterlund at oracle.com
Thu Mar 19 11:37:00 UTC 2020


Hi Stefan,

Nice! I like how you can now catch incorrect use of the API, without 
making it hard to use.
Looks good. Thanks for sorting this out.

/Erik

On 2020-03-19 10:44, Stefan Karlsson wrote:
> Hi all,
>
> Please review this patch to rewrite the GCTimer, and associated 
> classes, to not allow nested phases of different types (pause or 
> concurrent).
>
> https://cr.openjdk.java.net/~stefank/8241160/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8241160
>
> A bug was found when I was looking at JFR events from ZGC. A 
> GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The 
> only valid parent is a GCPhasePause event. The reason why this 
> happened was that the we use a GCTraceTime class inside the class 
> unloading code. Previously, we only used GCTraceTimes inside pauses, 
> but ever since class unloading was moved out to a concurrent phase, 
> this isn't true anymore. GCTraceTime used 
> GCTimer::register_gc_phase_start(name, Ticks, phase  = <pause>), and 
> therefore always reported pauses and pause sub-phases.
>
> With this patch, I suggest that we become stricter in our usages of 
> the GCTimer. The effects of the patch are:
>
> 1) When a top-level pause (or concurrent) phase is created, the code 
> must be explicit about what type of phase is created. The code will 
> now assert if this is abused. Most places were already explicit, but I 
> had to change two places:
>
> a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have 
> access to register_gc_pause_start. I made that function public, 
> instead of protected, so that we didn't have to deal with that problem.
>
> b) G1 used GCTraceTime to note the Remark/Cleanup  pauses (in 
> VM_G1Concurrent). This is the only place that uses GCTraceTime to 
> start a pause. All other places use GCTraceTime to create sub-phases. 
> I could have copy-n-pasted the entire 
> GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and 
> create a version that calls register_gc_pause_start instead of 
> register_gc_phase_start. Instead of doing that I opted for creating a 
> system where the code code register a set of callbacks to be called 
> when the start and end time is registered. This is used in the backend 
> of GCTraceTime, but then also used by G1 to allow us to not have to 
> copy-n-paste a lot of the code.
>
> I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic 
> to the default callbacks (unfied logging and GCTimer) but couldn't 
> find a nice way to express that, because of the way we macro-expand 
> the UL tags. Maybe something we can consider for a future investigation.
>
> 2) sub-phases now inherit the type from the parent phase, and there's 
> no possibility to incorrectly nest phases anymore. This also removed 
> the need for ConcurrentGCTimer::_is_concurrent_phase_active.
>
> 3) This allows (and encourages concurrent sub-phases). When the JFR 
> events were ported to HotSpot, only pauses got sub-phases, because 
> there wasn't a big need for concurrent sub-phases. In this patch I 
> added level of sub-phases to JFR. Maybe it would be better to add more 
> right away? (I'm not a fan of having the explicit sub-phase level 
> events, instead of a counter in *the* phase event, but the JMC team at 
> that time needed it to be logged as separate events. Maybe something 
> that could be reconsidered some time)
>
> 4) The different consumers of the timestamps are separated into their 
> own classes.
>
> 5) Shenandoah devs need to consider what to do about this change:
>
> - unloading_occurred = SystemDictionary::do_unloading(heap->gc_timer());
> + // FIXME: This turns off the previously broken JFR events. If we 
> want to keep reporting them,
> + // but with the correct type (Concurrent) then a top-level 
> concurrent phase is required.
> + unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer 
> */);
>
> Where this code caused GCPhasePauseLevel1 events for ZGC, this used to 
> create GCPhasePause events for Shenandoah. It uses GCTraceTime to log 
> sub-phases, but the current Shenandoah code hasn't registered a 
> top-level phase at this point. Either we keep this code with the 
> removal of the gc_timer argument, or we add a top-level phase 
> somewhere. If we want the latter, then I need suggestions on where to 
> add them. Or maybe push the current code, and fix it as a follow-up 
> patch?
>
> What do you think? An alternative is to (continue?) completely forbid 
> concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes 
> during concurrent phases. Even if we decide to do that, I think 
> there's some merit to the stricter GCTimer code, and the slight 
> separation of concern in GCTraceTime.
>
> Tested tier1-3
>
> Thanks,
> StefanK




More information about the hotspot-gc-dev mailing list