RFR (M): JDK-8068394: Trace event for concurrent GC phases

Wed Dec 2 18:08:40 UTC 2015

Hi Bengt,

On 12/02/2015 01:00 AM, Bengt Rutisson wrote:
>
> Hi Sangheon,
>
> Sorry again for the late reply. Comments inline.
No not at all.
Thank you for taking time for this review!

>
> On 2015-11-24 19:48, sangheon.kim wrote:
>> Hi Bengt,
>>
>> Thank you so much for reviewing this patch!
>>
>> On 11/24/2015 05:12 AM, Bengt Rutisson wrote:
>>>
>>> Hi Sangheon,
>>>
>>> Sorry for the very late reply to this review request.
>>>
>>> On 2015-10-22 01:40, sangheon.kim wrote:
>>>> Hi all,
>>>>
>>>> Can I get some reviews for this change of adding a trace event for 
>>>> concurrent phases on CMS and G1?
>>>>
>>>> Currently we only measure pause times for major phases.
>>>> But I want to add 'concurrent mark' and 'concurrent sweep' phases 
>>>> for CMS and 'concurrent mark' phase for G1.
>>>> To achieve this, I had to change ConcurrentGCTimer and related 
>>>> classes.
>>>>
>>>> This patch includes:
>>>> 1) A concurrent phase can be measured only from ConcurrentGCTimer 
>>>> and measuring an overlap between concurrent and pause is not 
>>>> allowed as currently we don't have that use case. And 
>>>> TimePartitions class(or related classes) will be simpler than an 
>>>> overlap allowed case.
>>>> 2) I removed PausePhase and ConcurrentPhase which are derived from 
>>>> GCPhase because I wanted to avoid heap allocation when adding to 
>>>> GrowableArray. Instead introduced 'type' member variable at GCPhase.
>>>>
>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8068394
>>>> Webrev: http://cr.openjdk.java.net/~sangheki/8068394/webrev.00/
>>>
>>> Nice work! It is great to get some timing information for the 
>>> concurrent phases.
>> Thanks!
>>
>>>
>>> A few questions/comments:
>>>
>>> CMS.
>>>
>>> You have added timing events for the concurrent phases mark and 
>>> sweep, but not for other concurrent phases (preclean, 
>>> abortable_preclean and reset_concurrent). I think that if you moved 
>>> your calls to _gc_timer_cm->register_gc_concurrent_start() and 
>>> _gc_timer_cm->register_gc_concurrent_end() into the constructor and 
>>> destructor of CMSPhaseAccounting you would automatically get timing 
>>> events for all concurrent phases.
>> I also considered about this idea but I was not clear whether it is 
>> good measurement especially for 'sweep'.
>> We are using CMSPhaseAccounting for 5 cases.
>> I think for 'mark' and 'preclean' they are okay. (there are 
>> non-product functions call before CMSPhaseAccounting. But they are 
>> okay).
>> For 'abortable-preclean' and 'reset', they are good to have that.
>> But when it comes to 'sweep', there are timer and 
>> CMSTokenSyncWithLocks related codes and I was not sure about these.
>> If you think they are also okay, I will change as you suggested.
>
> I think it is fine for the sweep part as well. If the 
> CMSPhaseAccounting instance is not measuring the correct part of the 
> work I think we should change it to measure what we want. That way we 
> improve the existing measurements too.
>
> But as you stated in your follow up email, I think the sweep code is 
> actually pretty much ok from this perspective.
>
> So, I would prefer to go with the CMSPhaseAccounting approach.
Okay, I already changed as you suggested. :)

>
>
>>
>>>
>>>
>>> G1.
>>>
>>> I think the use of _concurrent_marking_from_roots is unfortunate. It 
>>> would be much cleaner if 
>>> ConcurrentMark::register_mark_from_roots_phase_end() called 
>>> register_gc_concurrent_end() directly. 
>> I wanted to avoid introducing the new flag but the reason of it was, 
>> as you know, to handle 'abort'.
>> When 'abort' happens we do need to end all timers before calling 
>> 'GCTracer::report_gc_end()'.
>> And at this point we need a way to know whether concurrent timer is 
>> started or not.
>
> As I tried to describe I think you can extend the scope of the timing 
> to cover the part until the marking thread discovers that the marking 
> was aborted. Basically moving the timing in to 
> ConcurrentMarkThread::run_service().
Oh, I mis-understood your point. Sorry.
I will try your recommendation.

>
>>
>>> I realize that this would change the timing for when a concurrent 
>>> marking is aborted. The whole time for a full GC (or even several 
>>> full GCs) would be included in the concurrent marking phase. But 
>>> from a code perspective that is what happens, so maybe that is the 
>>> correct time to report? Also, I think the logging is reported that 
>>> way so if we want to make it easy to match the timing events with 
>>> the logging we might want to use about the same scope for timing.
>>>
>>> Why do we only measure the concurrent mark phase for G1? 
>> This is good question. :)
>> I was comparing CMS and G1 for major concurrent phases.
>> And the candidates were 'concurrent mark/sweep (CMS)' and 'concurrent 
>> mark(G1)'.
>>
>>> Wouldn't it be good to measure all concurrent phases?
>> Okay, Stefan Karlsson also pointed to have them as well so I filed a 
>> separate CR for this.
>> https://bugs.openjdk.java.net/browse/JDK-8143951
>
> Ok. Is there a reason why we want to split this up into two changes? 
> Why not add all the concurrent timing events at once?
>
> I'm ok with splitting it up into two changes, but then maybe the title 
> for JDK-8068394 should be changed to indicate that it only adds events 
> for the marking phase. Also, if we go with the CMSPhaseAccounting 
> approach for CMS, you get events for all phases there. So, it will be 
> a bit odd to have all concurrent phases handled for CMS but only 
> concurrent mark for G1.
You are correct and I agree with you.
The only reason of split was due to limited time for FC.
But as we have enough time, let me close JDK-8143951 and include 
concurrent mark for G1 as well.

>
>>
>>>
>>> gcTraceSend.cpp
>>>
>>>  389   void visit(GCPhase* phase) {
>>>  390     switch (phase->type()) {
>>>  391       case GCPhase::PausePhaseType:
>>>  392         assert(phase->level() < PhasesStack::PHASE_LEVELS, 
>>> "Need more event types for PausePhase");
>>>  393
>>>  394         switch (phase->level()) {
>>>  395           case 0: send_phase<EventGCPhasePause>(phase); break;
>>>  396           case 1: send_phase<EventGCPhasePauseLevel1>(phase); 
>>> break;
>>>  397           case 2: send_phase<EventGCPhasePauseLevel2>(phase); 
>>> break;
>>>  398           case 3: send_phase<EventGCPhasePauseLevel3>(phase); 
>>> break;
>>>  399           default: /* Ignore sending this phase */ break;
>>>  400         }
>>>  401         break;
>>>  402
>>>  403       case GCPhase::ConcurrentPhaseType:
>>>  404         assert(phase->level() < 1, "There's only one level for 
>>> ConcurrentPhase");
>>>  405
>>>  406         switch (phase->level()) {
>>>  407           case 0: send_phase<EventGCPhaseConcurrent>(phase); 
>>> break;
>>>  408           default: /* Ignore sending this phase */ break;
>>>  409         }
>>>  410         break;
>>>  411     }
>>>  412   }
>>>  413 };
>>>
>>> Since there are only two values for GCPhase::PhaseType it seems a 
>>> bit odd to use a switch statement. I think I would prefer to factor 
>>> out the code for the different types a bit too. So, maybe something 
>>> like:
>>>
>>> void visit(GCPhase* phase) {
>>>   if (phase->type() == GCPhase::PausePhaseType) {
>>>     visit_pause(phase);
>>>   } else {
>>>     assert(phase->type() == GCPhase::ConcurrentPhaseType, "");
>>>     visit_concurrent(phase);
>>>   }
>>> }
>> This seems better. I will fix.
>
> Good. Thanks!
:)

I will post next webrev when I'm ready, including JDK-8143951 part as well.

Thanks,
Sangheon

>
> Bengt
>
>>
>> Thanks,
>> Sangheon
>>
>>
>>>
>>> Thanks,
>>> Bengt
>>>
>>>> Testing: JPRT
>>>>
>>>> Thanks,
>>>> Sangheon
>>>
>>
>