RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events

Thu Mar 19 09:44:39 UTC 2020

Hi all,

Please review this patch to rewrite the GCTimer, and associated classes, 
to not allow nested phases of different types (pause or concurrent).

https://cr.openjdk.java.net/~stefank/8241160/webrev.01/
https://bugs.openjdk.java.net/browse/JDK-8241160

A bug was found when I was looking at JFR events from ZGC. A 
GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The only 
valid parent is a GCPhasePause event. The reason why this happened was 
that the we use a GCTraceTime class inside the class unloading code. 
Previously, we only used GCTraceTimes inside pauses, but ever since 
class unloading was moved out to a concurrent phase, this isn't true 
anymore. GCTraceTime used GCTimer::register_gc_phase_start(name, Ticks, 
phase  = <pause>), and therefore always reported pauses and pause 
sub-phases.

With this patch, I suggest that we become stricter in our usages of the 
GCTimer. The effects of the patch are:

1) When a top-level pause (or concurrent) phase is created, the code 
must be explicit about what type of phase is created. The code will now 
assert if this is abused. Most places were already explicit, but I had 
to change two places:

a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have 
access to register_gc_pause_start. I made that function public, instead 
of protected, so that we didn't have to deal with that problem.

b) G1 used GCTraceTime to note the Remark/Cleanup  pauses (in 
VM_G1Concurrent). This is the only place that uses GCTraceTime to start 
a pause. All other places use GCTraceTime to create sub-phases. I could 
have copy-n-pasted the entire 
GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and 
create a version that calls register_gc_pause_start instead of 
register_gc_phase_start. Instead of doing that I opted for creating a 
system where the code code register a set of callbacks to be called when 
the start and end time is registered. This is used in the backend of 
GCTraceTime, but then also used by G1 to allow us to not have to 
copy-n-paste a lot of the code.

I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic 
to the default callbacks (unfied logging and GCTimer) but couldn't find 
a nice way to express that, because of the way we macro-expand the UL 
tags. Maybe something we can consider for a future investigation.

2) sub-phases now inherit the type from the parent phase, and there's no 
possibility to incorrectly nest phases anymore. This also removed the 
need for ConcurrentGCTimer::_is_concurrent_phase_active.

3) This allows (and encourages concurrent sub-phases). When the JFR 
events were ported to HotSpot, only pauses got sub-phases, because there 
wasn't a big need for concurrent sub-phases. In this patch I added level 
of sub-phases to JFR. Maybe it would be better to add more right away? 
(I'm not a fan of having the explicit sub-phase level events, instead of 
a counter in *the* phase event, but the JMC team at that time needed it 
to be logged as separate events. Maybe something that could be 
reconsidered some time)

4) The different consumers of the timestamps are separated into their 
own classes.

5) Shenandoah devs need to consider what to do about this change:

- unloading_occurred = SystemDictionary::do_unloading(heap->gc_timer());
+ // FIXME: This turns off the previously broken JFR events. If we want 
to keep reporting them,
+ // but with the correct type (Concurrent) then a top-level concurrent 
phase is required.
+ unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer */);

Where this code caused GCPhasePauseLevel1 events for ZGC, this used to 
create GCPhasePause events for Shenandoah. It uses GCTraceTime to log 
sub-phases, but the current Shenandoah code hasn't registered a 
top-level phase at this point. Either we keep this code with the removal 
of the gc_timer argument, or we add a top-level phase somewhere. If we 
want the latter, then I need suggestions on where to add them. Or maybe 
push the current code, and fix it as a follow-up patch?

What do you think? An alternative is to (continue?) completely forbid 
concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes 
during concurrent phases. Even if we decide to do that, I think there's 
some merit to the stricter GCTimer code, and the slight separation of 
concern in GCTraceTime.

Tested tier1-3

Thanks,
StefanK