RFR: 8241160: Concurrent class unloading reports GCTraceTime events as JFR pause sub-phase events
Stefan Karlsson
stefan.karlsson at oracle.com
Mon Mar 30 11:12:03 UTC 2020
Hi again,
Did you find the time to take a look at this?
I'd like to propose that we go with the current solution to disable the
incorrect reporting of the events for now, until you find the time to
look at this. The effect of this will be that you won't get this
sub-phase reported, but at the same time it do remove the bug that a
pause was reported.
Thanks,
StefanK
On 2020-03-26 10:55, Stefan Karlsson wrote:
> On 2020-03-26 10:54, Roman Kennke wrote:
>> Hey Stefan,
>>
>> Sorry, this went under my radar. Give us half a day or so, yes?
> Sure.
>
> StefanK
>
>>
>> Thanks,
>> Roman
>>
>>> Shenandoah devs, any comments w.r.t. to the Shenandoah section below?
>>>
>>> Thanks,
>>> StefanK
>>>
>>> On 2020-03-19 10:44, Stefan Karlsson wrote:
>>>> Hi all,
>>>>
>>>> Please review this patch to rewrite the GCTimer, and associated
>>>> classes, to not allow nested phases of different types (pause or
>>>> concurrent).
>>>>
>>>> https://cr.openjdk.java.net/~stefank/8241160/webrev.01/
>>>> https://bugs.openjdk.java.net/browse/JDK-8241160
>>>>
>>>> A bug was found when I was looking at JFR events from ZGC. A
>>>> GCPhasePauseLevel1 event was nested within a GCPhaseConcurrent. The
>>>> only valid parent is a GCPhasePause event. The reason why this
>>>> happened was that the we use a GCTraceTime class inside the class
>>>> unloading code. Previously, we only used GCTraceTimes inside pauses,
>>>> but ever since class unloading was moved out to a concurrent phase,
>>>> this isn't true anymore. GCTraceTime used
>>>> GCTimer::register_gc_phase_start(name, Ticks, phase = <pause>), and
>>>> therefore always reported pauses and pause sub-phases.
>>>>
>>>> With this patch, I suggest that we become stricter in our usages of
>>>> the GCTimer. The effects of the patch are:
>>>>
>>>> 1) When a top-level pause (or concurrent) phase is created, the code
>>>> must be explicit about what type of phase is created. The code will
>>>> now assert if this is abused. Most places were already explicit, but I
>>>> had to change two places:
>>>>
>>>> a) Shenandoah type-erased ConcurrentGCTimer and therefore didn't have
>>>> access to register_gc_pause_start. I made that function public,
>>>> instead of protected, so that we didn't have to deal with that
>>>> problem.
>>>>
>>>> b) G1 used GCTraceTime to note the Remark/Cleanup pauses (in
>>>> VM_G1Concurrent). This is the only place that uses GCTraceTime to
>>>> start a pause. All other places use GCTraceTime to create sub-phases.
>>>> I could have copy-n-pasted the entire
>>>> GCTraceTime/GCTraceTimeWrapper/GCTraceTimeWrapper implementation and
>>>> create a version that calls register_gc_pause_start instead of
>>>> register_gc_phase_start. Instead of doing that I opted for creating a
>>>> system where the code code register a set of callbacks to be called
>>>> when the start and end time is registered. This is used in the backend
>>>> of GCTraceTime, but then also used by G1 to allow us to not have to
>>>> copy-n-paste a lot of the code.
>>>>
>>>> I would have liked to make GCTraceTimeImpl/GCTraceTimeWrapper agnostic
>>>> to the default callbacks (unfied logging and GCTimer) but couldn't
>>>> find a nice way to express that, because of the way we macro-expand
>>>> the UL tags. Maybe something we can consider for a future
>>>> investigation.
>>>>
>>>> 2) sub-phases now inherit the type from the parent phase, and there's
>>>> no possibility to incorrectly nest phases anymore. This also removed
>>>> the need for ConcurrentGCTimer::_is_concurrent_phase_active.
>>>>
>>>> 3) This allows (and encourages concurrent sub-phases). When the JFR
>>>> events were ported to HotSpot, only pauses got sub-phases, because
>>>> there wasn't a big need for concurrent sub-phases. In this patch I
>>>> added level of sub-phases to JFR. Maybe it would be better to add more
>>>> right away? (I'm not a fan of having the explicit sub-phase level
>>>> events, instead of a counter in *the* phase event, but the JMC team at
>>>> that time needed it to be logged as separate events. Maybe something
>>>> that could be reconsidered some time)
>>>>
>>>> 4) The different consumers of the timestamps are separated into their
>>>> own classes.
>>>>
>>>> 5) Shenandoah devs need to consider what to do about this change:
>>>>
>>>> - unloading_occurred =
>>>> SystemDictionary::do_unloading(heap->gc_timer());
>>>> + // FIXME: This turns off the previously broken JFR events. If we
>>>> want to keep reporting them,
>>>> + // but with the correct type (Concurrent) then a top-level
>>>> concurrent phase is required.
>>>> + unloading_occurred = SystemDictionary::do_unloading(NULL /* gc_timer
>>>> */);
>>>>
>>>> Where this code caused GCPhasePauseLevel1 events for ZGC, this used to
>>>> create GCPhasePause events for Shenandoah. It uses GCTraceTime to log
>>>> sub-phases, but the current Shenandoah code hasn't registered a
>>>> top-level phase at this point. Either we keep this code with the
>>>> removal of the gc_timer argument, or we add a top-level phase
>>>> somewhere. If we want the latter, then I need suggestions on where to
>>>> add them. Or maybe push the current code, and fix it as a follow-up
>>>> patch?
>>>>
>>>> What do you think? An alternative is to (continue?) completely forbid
>>>> concurrent sub-phases, and remove the gc_timers passed to GCTraceTimes
>>>> during concurrent phases. Even if we decide to do that, I think
>>>> there's some merit to the stricter GCTimer code, and the slight
>>>> separation of concern in GCTraceTime.
>>>>
>>>> Tested tier1-3
>>>>
>>>> Thanks,
>>>> StefanK
>
More information about the hotspot-gc-dev
mailing list