RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events
Roman Kennke
rkennke at redhat.com
Thu Mar 26 09:54:34 UTC 2020
Hey Stefan,
Sorry, this went under my radar. Give us half a day or so, yes?
Thanks,
Roman
> Shenandoah devs, any comments w.r.t. to the Shenandoah section below?
>
> Thanks,
> StefanK
>
> On 2020-03-19 10:44, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to rewrite the GCTimer, and associated
>> classes, to not allow nested phases of different types (pause or
>> concurrent).
>>
>> https://cr.openjdk.java.net/~stefank/8241160/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8241160
>>
>> A bug was found when I was looking at JFR events from ZGC. A
>> GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The
>> only valid parent is a GCPhasePause event. The reason why this
>> happened was that the we use a GCTraceTime class inside the class
>> unloading code. Previously, we only used GCTraceTimes inside pauses,
>> but ever since class unloading was moved out to a concurrent phase,
>> this isn't true anymore. GCTraceTime used
>> GCTimer::register_gc_phase_start(name, Ticks, phase = <pause>), and
>> therefore always reported pauses and pause sub-phases.
>>
>> With this patch, I suggest that we become stricter in our usages of
>> the GCTimer. The effects of the patch are:
>>
>> 1) When a top-level pause (or concurrent) phase is created, the code
>> must be explicit about what type of phase is created. The code will
>> now assert if this is abused. Most places were already explicit, but I
>> had to change two places:
>>
>> a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have
>> access to register_gc_pause_start. I made that function public,
>> instead of protected, so that we didn't have to deal with that problem.
>>
>> b) G1 used GCTraceTime to note the Remark/Cleanup pauses (in
>> VM_G1Concurrent). This is the only place that uses GCTraceTime to
>> start a pause. All other places use GCTraceTime to create sub-phases.
>> I could have copy-n-pasted the entire
>> GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and
>> create a version that calls register_gc_pause_start instead of
>> register_gc_phase_start. Instead of doing that I opted for creating a
>> system where the code code register a set of callbacks to be called
>> when the start and end time is registered. This is used in the backend
>> of GCTraceTime, but then also used by G1 to allow us to not have to
>> copy-n-paste a lot of the code.
>>
>> I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic
>> to the default callbacks (unfied logging and GCTimer) but couldn't
>> find a nice way to express that, because of the way we macro-expand
>> the UL tags. Maybe something we can consider for a future investigation.
>>
>> 2) sub-phases now inherit the type from the parent phase, and there's
>> no possibility to incorrectly nest phases anymore. This also removed
>> the need for ConcurrentGCTimer::_is_concurrent_phase_active.
>>
>> 3) This allows (and encourages concurrent sub-phases). When the JFR
>> events were ported to HotSpot, only pauses got sub-phases, because
>> there wasn't a big need for concurrent sub-phases. In this patch I
>> added level of sub-phases to JFR. Maybe it would be better to add more
>> right away? (I'm not a fan of having the explicit sub-phase level
>> events, instead of a counter in *the* phase event, but the JMC team at
>> that time needed it to be logged as separate events. Maybe something
>> that could be reconsidered some time)
>>
>> 4) The different consumers of the timestamps are separated into their
>> own classes.
>>
>> 5) Shenandoah devs need to consider what to do about this change:
>>
>> - unloading_occurred = SystemDictionary::do_unloading(heap->gc_timer());
>> + // FIXME: This turns off the previously broken JFR events. If we
>> want to keep reporting them,
>> + // but with the correct type (Concurrent) then a top-level
>> concurrent phase is required.
>> + unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer
>> */);
>>
>> Where this code caused GCPhasePauseLevel1 events for ZGC, this used to
>> create GCPhasePause events for Shenandoah. It uses GCTraceTime to log
>> sub-phases, but the current Shenandoah code hasn't registered a
>> top-level phase at this point. Either we keep this code with the
>> removal of the gc_timer argument, or we add a top-level phase
>> somewhere. If we want the latter, then I need suggestions on where to
>> add them. Or maybe push the current code, and fix it as a follow-up
>> patch?
>>
>> What do you think? An alternative is to (continue?) completely forbid
>> concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes
>> during concurrent phases. Even if we decide to do that, I think
>> there's some merit to the stricter GCTimer code, and the slight
>> separation of concern in GCTraceTime.
>>
>> Tested tier1-3
>>
>> Thanks,
>> StefanK
>
More information about the hotspot-gc-dev
mailing list