RFR (M): JDK-8068394: Trace event for concurrent GC phases
sangheon.kim
sangheon.kim at oracle.com
Wed Dec 2 18:08:40 UTC 2015
Hi Bengt,
On 12/02/2015 01:00 AM, Bengt Rutisson wrote:
>
> Hi Sangheon,
>
> Sorry again for the late reply. Comments inline.
No not at all.
Thank you for taking time for this review!
>
> On 2015-11-24 19:48, sangheon.kim wrote:
>> Hi Bengt,
>>
>> Thank you so much for reviewing this patch!
>>
>> On 11/24/2015 05:12 AM, Bengt Rutisson wrote:
>>>
>>> Hi Sangheon,
>>>
>>> Sorry for the very late reply to this review request.
>>>
>>> On 2015-10-22 01:40, sangheon.kim wrote:
>>>> Hi all,
>>>>
>>>> Can I get some reviews for this change of adding a trace event for
>>>> concurrent phases on CMS and G1?
>>>>
>>>> Currently we only measure pause times for major phases.
>>>> But I want to add 'concurrent mark' and 'concurrent sweep' phases
>>>> for CMS and 'concurrent mark' phase for G1.
>>>> To achieve this, I had to change ConcurrentGCTimer and related
>>>> classes.
>>>>
>>>> This patch includes:
>>>> 1) A concurrent phase can be measured only from ConcurrentGCTimer
>>>> and measuring an overlap between concurrent and pause is not
>>>> allowed as currently we don't have that use case. And
>>>> TimePartitions class(or related classes) will be simpler than an
>>>> overlap allowed case.
>>>> 2) I removed PausePhase and ConcurrentPhase which are derived from
>>>> GCPhase because I wanted to avoid heap allocation when adding to
>>>> GrowableArray. Instead introduced 'type' member variable at GCPhase.
>>>>
>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8068394
>>>> Webrev: http://cr.openjdk.java.net/~sangheki/8068394/webrev.00/
>>>
>>> Nice work! It is great to get some timing information for the
>>> concurrent phases.
>> Thanks!
>>
>>>
>>> A few questions/comments:
>>>
>>> CMS.
>>>
>>> You have added timing events for the concurrent phases mark and
>>> sweep, but not for other concurrent phases (preclean,
>>> abortable_preclean and reset_concurrent). I think that if you moved
>>> your calls to _gc_timer_cm->register_gc_concurrent_start() and
>>> _gc_timer_cm->register_gc_concurrent_end() into the constructor and
>>> destructor of CMSPhaseAccounting you would automatically get timing
>>> events for all concurrent phases.
>> I also considered about this idea but I was not clear whether it is
>> good measurement especially for 'sweep'.
>> We are using CMSPhaseAccounting for 5 cases.
>> I think for 'mark' and 'preclean' they are okay. (there are
>> non-product functions call before CMSPhaseAccounting. But they are
>> okay).
>> For 'abortable-preclean' and 'reset', they are good to have that.
>> But when it comes to 'sweep', there are timer and
>> CMSTokenSyncWithLocks related codes and I was not sure about these.
>> If you think they are also okay, I will change as you suggested.
>
> I think it is fine for the sweep part as well. If the
> CMSPhaseAccounting instance is not measuring the correct part of the
> work I think we should change it to measure what we want. That way we
> improve the existing measurements too.
>
> But as you stated in your follow up email, I think the sweep code is
> actually pretty much ok from this perspective.
>
> So, I would prefer to go with the CMSPhaseAccounting approach.
Okay, I already changed as you suggested. :)
>
>
>>
>>>
>>>
>>> G1.
>>>
>>> I think the use of _concurrent_marking_from_roots is unfortunate. It
>>> would be much cleaner if
>>> ConcurrentMark::register_mark_from_roots_phase_end() called
>>> register_gc_concurrent_end() directly.
>> I wanted to avoid introducing the new flag but the reason of it was,
>> as you know, to handle 'abort'.
>> When 'abort' happens we do need to end all timers before calling
>> 'GCTracer::report_gc_end()'.
>> And at this point we need a way to know whether concurrent timer is
>> started or not.
>
> As I tried to describe I think you can extend the scope of the timing
> to cover the part until the marking thread discovers that the marking
> was aborted. Basically moving the timing in to
> ConcurrentMarkThread::run_service().
Oh, I mis-understood your point. Sorry.
I will try your recommendation.
>
>>
>>> I realize that this would change the timing for when a concurrent
>>> marking is aborted. The whole time for a full GC (or even several
>>> full GCs) would be included in the concurrent marking phase. But
>>> from a code perspective that is what happens, so maybe that is the
>>> correct time to report? Also, I think the logging is reported that
>>> way so if we want to make it easy to match the timing events with
>>> the logging we might want to use about the same scope for timing.
>>>
>>> Why do we only measure the concurrent mark phase for G1?
>> This is good question. :)
>> I was comparing CMS and G1 for major concurrent phases.
>> And the candidates were 'concurrent mark/sweep (CMS)' and 'concurrent
>> mark(G1)'.
>>
>>> Wouldn't it be good to measure all concurrent phases?
>> Okay, Stefan Karlsson also pointed to have them as well so I filed a
>> separate CR for this.
>> https://bugs.openjdk.java.net/browse/JDK-8143951
>
> Ok. Is there a reason why we want to split this up into two changes?
> Why not add all the concurrent timing events at once?
>
> I'm ok with splitting it up into two changes, but then maybe the title
> for JDK-8068394 should be changed to indicate that it only adds events
> for the marking phase. Also, if we go with the CMSPhaseAccounting
> approach for CMS, you get events for all phases there. So, it will be
> a bit odd to have all concurrent phases handled for CMS but only
> concurrent mark for G1.
You are correct and I agree with you.
The only reason of split was due to limited time for FC.
But as we have enough time, let me close JDK-8143951 and include
concurrent mark for G1 as well.
>
>>
>>>
>>> gcTraceSend.cpp
>>>
>>> 389 void visit(GCPhase* phase) {
>>> 390 switch (phase->type()) {
>>> 391 case GCPhase::PausePhaseType:
>>> 392 assert(phase->level() < PhasesStack::PHASE_LEVELS,
>>> "Need more event types for PausePhase");
>>> 393
>>> 394 switch (phase->level()) {
>>> 395 case 0: send_phase<EventGCPhasePause>(phase); break;
>>> 396 case 1: send_phase<EventGCPhasePauseLevel1>(phase);
>>> break;
>>> 397 case 2: send_phase<EventGCPhasePauseLevel2>(phase);
>>> break;
>>> 398 case 3: send_phase<EventGCPhasePauseLevel3>(phase);
>>> break;
>>> 399 default: /* Ignore sending this phase */ break;
>>> 400 }
>>> 401 break;
>>> 402
>>> 403 case GCPhase::ConcurrentPhaseType:
>>> 404 assert(phase->level() < 1, "There's only one level for
>>> ConcurrentPhase");
>>> 405
>>> 406 switch (phase->level()) {
>>> 407 case 0: send_phase<EventGCPhaseConcurrent>(phase);
>>> break;
>>> 408 default: /* Ignore sending this phase */ break;
>>> 409 }
>>> 410 break;
>>> 411 }
>>> 412 }
>>> 413 };
>>>
>>> Since there are only two values for GCPhase::PhaseType it seems a
>>> bit odd to use a switch statement. I think I would prefer to factor
>>> out the code for the different types a bit too. So, maybe something
>>> like:
>>>
>>> void visit(GCPhase* phase) {
>>> if (phase->type() == GCPhase::PausePhaseType) {
>>> visit_pause(phase);
>>> } else {
>>> assert(phase->type() == GCPhase::ConcurrentPhaseType, "");
>>> visit_concurrent(phase);
>>> }
>>> }
>> This seems better. I will fix.
>
> Good. Thanks!
:)
I will post next webrev when I'm ready, including JDK-8143951 part as well.
Thanks,
Sangheon
>
> Bengt
>
>>
>> Thanks,
>> Sangheon
>>
>>
>>>
>>> Thanks,
>>> Bengt
>>>
>>>> Testing: JPRT
>>>>
>>>> Thanks,
>>>> Sangheon
>>>
>>
>
More information about the hotspot-gc-dev
mailing list