RFR (M): JDK-8068394: Trace event for concurrent GC phases

Thu Dec 3 08:03:45 UTC 2015

Hi Sangheon,

This all sounds good. Looking forward to the next webrev.

Thanks,
Bengt

On 2015-12-02 19:08, sangheon.kim wrote:
> Hi Bengt,
>
>
> On 12/02/2015 01:00 AM, Bengt Rutisson wrote:
>>
>> Hi Sangheon,
>>
>> Sorry again for the late reply. Comments inline.
> No not at all.
> Thank you for taking time for this review!
>
>>
>> On 2015-11-24 19:48, sangheon.kim wrote:
>>> Hi Bengt,
>>>
>>> Thank you so much for reviewing this patch!
>>>
>>> On 11/24/2015 05:12 AM, Bengt Rutisson wrote:
>>>>
>>>> Hi Sangheon,
>>>>
>>>> Sorry for the very late reply to this review request.
>>>>
>>>> On 2015-10-22 01:40, sangheon.kim wrote:
>>>>> Hi all,
>>>>>
>>>>> Can I get some reviews for this change of adding a trace event for 
>>>>> concurrent phases on CMS and G1?
>>>>>
>>>>> Currently we only measure pause times for major phases.
>>>>> But I want to add 'concurrent mark' and 'concurrent sweep' phases 
>>>>> for CMS and 'concurrent mark' phase for G1.
>>>>> To achieve this, I had to change ConcurrentGCTimer and related 
>>>>> classes.
>>>>>
>>>>> This patch includes:
>>>>> 1) A concurrent phase can be measured only from ConcurrentGCTimer 
>>>>> and measuring an overlap between concurrent and pause is not 
>>>>> allowed as currently we don't have that use case. And 
>>>>> TimePartitions class(or related classes) will be simpler than an 
>>>>> overlap allowed case.
>>>>> 2) I removed PausePhase and ConcurrentPhase which are derived from 
>>>>> GCPhase because I wanted to avoid heap allocation when adding to 
>>>>> GrowableArray. Instead introduced 'type' member variable at GCPhase.
>>>>>
>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8068394
>>>>> Webrev: http://cr.openjdk.java.net/~sangheki/8068394/webrev.00/
>>>>
>>>> Nice work! It is great to get some timing information for the 
>>>> concurrent phases.
>>> Thanks!
>>>
>>>>
>>>> A few questions/comments:
>>>>
>>>> CMS.
>>>>
>>>> You have added timing events for the concurrent phases mark and 
>>>> sweep, but not for other concurrent phases (preclean, 
>>>> abortable_preclean and reset_concurrent). I think that if you moved 
>>>> your calls to _gc_timer_cm->register_gc_concurrent_start() and 
>>>> _gc_timer_cm->register_gc_concurrent_end() into the constructor and 
>>>> destructor of CMSPhaseAccounting you would automatically get timing 
>>>> events for all concurrent phases.
>>> I also considered about this idea but I was not clear whether it is 
>>> good measurement especially for 'sweep'.
>>> We are using CMSPhaseAccounting for 5 cases.
>>> I think for 'mark' and 'preclean' they are okay. (there are 
>>> non-product functions call before CMSPhaseAccounting. But they are 
>>> okay).
>>> For 'abortable-preclean' and 'reset', they are good to have that.
>>> But when it comes to 'sweep', there are timer and 
>>> CMSTokenSyncWithLocks related codes and I was not sure about these.
>>> If you think they are also okay, I will change as you suggested.
>>
>> I think it is fine for the sweep part as well. If the 
>> CMSPhaseAccounting instance is not measuring the correct part of the 
>> work I think we should change it to measure what we want. That way we 
>> improve the existing measurements too.
>>
>> But as you stated in your follow up email, I think the sweep code is 
>> actually pretty much ok from this perspective.
>>
>> So, I would prefer to go with the CMSPhaseAccounting approach.
> Okay, I already changed as you suggested. :)
>
>>
>>
>>>
>>>>
>>>>
>>>> G1.
>>>>
>>>> I think the use of _concurrent_marking_from_roots is unfortunate. 
>>>> It would be much cleaner if 
>>>> ConcurrentMark::register_mark_from_roots_phase_end() called 
>>>> register_gc_concurrent_end() directly. 
>>> I wanted to avoid introducing the new flag but the reason of it was, 
>>> as you know, to handle 'abort'.
>>> When 'abort' happens we do need to end all timers before calling 
>>> 'GCTracer::report_gc_end()'.
>>> And at this point we need a way to know whether concurrent timer is 
>>> started or not.
>>
>> As I tried to describe I think you can extend the scope of the timing 
>> to cover the part until the marking thread discovers that the marking 
>> was aborted. Basically moving the timing in to 
>> ConcurrentMarkThread::run_service().
> Oh, I mis-understood your point. Sorry.
> I will try your recommendation.
>
>>
>>>
>>>> I realize that this would change the timing for when a concurrent 
>>>> marking is aborted. The whole time for a full GC (or even several 
>>>> full GCs) would be included in the concurrent marking phase. But 
>>>> from a code perspective that is what happens, so maybe that is the 
>>>> correct time to report? Also, I think the logging is reported that 
>>>> way so if we want to make it easy to match the timing events with 
>>>> the logging we might want to use about the same scope for timing.
>>>>
>>>> Why do we only measure the concurrent mark phase for G1? 
>>> This is good question. :)
>>> I was comparing CMS and G1 for major concurrent phases.
>>> And the candidates were 'concurrent mark/sweep (CMS)' and 
>>> 'concurrent mark(G1)'.
>>>
>>>> Wouldn't it be good to measure all concurrent phases?
>>> Okay, Stefan Karlsson also pointed to have them as well so I filed a 
>>> separate CR for this.
>>> https://bugs.openjdk.java.net/browse/JDK-8143951
>>
>> Ok. Is there a reason why we want to split this up into two changes? 
>> Why not add all the concurrent timing events at once?
>>
>> I'm ok with splitting it up into two changes, but then maybe the 
>> title for JDK-8068394 should be changed to indicate that it only adds 
>> events for the marking phase. Also, if we go with the 
>> CMSPhaseAccounting approach for CMS, you get events for all phases 
>> there. So, it will be a bit odd to have all concurrent phases handled 
>> for CMS but only concurrent mark for G1.
> You are correct and I agree with you.
> The only reason of split was due to limited time for FC.
> But as we have enough time, let me close JDK-8143951 and include 
> concurrent mark for G1 as well.
>
>>
>>>
>>>>
>>>> gcTraceSend.cpp
>>>>
>>>>  389   void visit(GCPhase* phase) {
>>>>  390     switch (phase->type()) {
>>>>  391       case GCPhase::PausePhaseType:
>>>>  392         assert(phase->level() < PhasesStack::PHASE_LEVELS, 
>>>> "Need more event types for PausePhase");
>>>>  393
>>>>  394         switch (phase->level()) {
>>>>  395           case 0: send_phase<EventGCPhasePause>(phase); break;
>>>>  396           case 1: send_phase<EventGCPhasePauseLevel1>(phase); 
>>>> break;
>>>>  397           case 2: send_phase<EventGCPhasePauseLevel2>(phase); 
>>>> break;
>>>>  398           case 3: send_phase<EventGCPhasePauseLevel3>(phase); 
>>>> break;
>>>>  399           default: /* Ignore sending this phase */ break;
>>>>  400         }
>>>>  401         break;
>>>>  402
>>>>  403       case GCPhase::ConcurrentPhaseType:
>>>>  404         assert(phase->level() < 1, "There's only one level for 
>>>> ConcurrentPhase");
>>>>  405
>>>>  406         switch (phase->level()) {
>>>>  407           case 0: send_phase<EventGCPhaseConcurrent>(phase); 
>>>> break;
>>>>  408           default: /* Ignore sending this phase */ break;
>>>>  409         }
>>>>  410         break;
>>>>  411     }
>>>>  412   }
>>>>  413 };
>>>>
>>>> Since there are only two values for GCPhase::PhaseType it seems a 
>>>> bit odd to use a switch statement. I think I would prefer to factor 
>>>> out the code for the different types a bit too. So, maybe something 
>>>> like:
>>>>
>>>> void visit(GCPhase* phase) {
>>>>   if (phase->type() == GCPhase::PausePhaseType) {
>>>>     visit_pause(phase);
>>>>   } else {
>>>>     assert(phase->type() == GCPhase::ConcurrentPhaseType, "");
>>>>     visit_concurrent(phase);
>>>>   }
>>>> }
>>> This seems better. I will fix.
>>
>> Good. Thanks!
> :)
>
> I will post next webrev when I'm ready, including JDK-8143951 part as 
> well.
>
> Thanks,
> Sangheon
>
>
>>
>> Bengt
>>
>>>
>>> Thanks,
>>> Sangheon
>>>
>>>
>>>>
>>>> Thanks,
>>>> Bengt
>>>>
>>>>> Testing: JPRT
>>>>>
>>>>> Thanks,
>>>>> Sangheon
>>>>
>>>
>>
>