RFR (M): JDK-8068394: Trace event for concurrent GC phases

Tue Dec 15 19:29:38 UTC 2015

Hi Stefan,

Thank you for looking at official RFR as well.

On 12/15/2015 12:58 AM, Stefan Karlsson wrote:
> Hi Sangheon,
>
> On 2015-12-15 07:27, sangheon wrote:
>> Hi Bengt and all,
>>
>> Here's a next webrev.
>> Bengt and I had a discussion on other thread and he and Stefan 
>> Karlsson suggested simpler design.
>>
>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.01
>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.01_to_00
>
> Thanks for doing the simplifications of the patch.
>
> I have a couple of comments:
> Could you replace NOT_PRODUCT with DEBUG_ONLY, and #ifndef PRODUCT 
> with #ifdef ASSERT. Otherwise, you'll get inconsistencies when 
> building with optimized builds where neither PRODUCT nor ASSERT is 
> defined.
Okay, I will do.
But you also pointed that whether this flag is really needed below.

>
> 81 void ConcurrentGCTimer::register_gc_concurrent_start(const char* 
> name, const Ticks& time) {
> 82 assert(!_is_concurrent_phase_active, "A concurrent phase is already 
> active.");
> 83 _time_partitions.report_gc_phase_start(name, time, 
> GCPhase::ConcurrentPhaseType);
> 84 NOT_PRODUCT(_is_concurrent_phase_active = true;)
> 85 }
>
>  162 class ConcurrentGCTimer : public GCTimer {
> 163 // ConcurrentGCTimer can't be used if there is an overlap between 
> a pause phase and a concurrent phase.
> 164 // _is_concurrent_phase_active is used to find above case.
> 165 NOT_PRODUCT(volatile bool _is_concurrent_phase_active;)
> 166
>  167  public:
>  168 #ifndef PRODUCT
> 169 ConcurrentGCTimer(): GCTimer(), _is_concurrent_phase_active(false) 
> {};
> 170 #endif Is it really worth optimizing away the boolean? The code 
> would be cleaner without these macros.
I just wanted to prevent mis-use of concurrent timer for overlapping phase.
A pause phase nor another concurrent phase can't be started if a 
concurrent phase is already started.
I think this would be nice to have. But I don't have a strong opinion on 
this as currently we don't have these situations.

>
>
> Why are you using volatile here? I don't see why that should be needed.
>
>  162 class ConcurrentGCTimer : public GCTimer {
>  163 // ConcurrentGCTimer can't be used if there is an overlap between 
> a pause phase and a concurrent phase.
> 164 // _is_concurrent_phase_active is used to find above case.
> 165 NOT_PRODUCT(volatile bool _is_concurrent_phase_active;)
You are right, volatile is not needed here.
I was thinking register_gc_concurrent_end() would be called from 
different threads(ConcurrentMarkThread and VMThread), but as it only 
happens from safepoint, I think volatile should be removed.

I will post a new webrev with your opinion about the flag.

Thanks,
Sangheon

>
>
> Thanks,
> StefanK
>
>>
>> This webrev includes:
>> 1. Moving concurrent phase measurement into CMSPhaseAccounting class. 
>> So we can check all concurrent phases from CMS.
>>
>> 2. Renamed a new flag introduced from webrev.00 for general use, 
>> _concurrent_phase_started.
>>
>> 3. Added helper class for concurrent phase measurement, 
>> GCConcPhaseTimer. And used this class for all concurrent phases.
>>
>> 4. Flag checking is moved from 
>> G1CollectedHeap::register_concurrent_cycle_end() to 
>> ConcurrentMark::abort().
>>
>> 5. Changed switch() to if else statement at gcTraceSend.
>>
>> Many thanks to Bengt and Stefan Karlsson for the discussion.
>>
>> Thanks,
>> Sangheon
>>
>>
>> On 12/03/2015 12:03 AM, Bengt Rutisson wrote:
>>>
>>> Hi Sangheon,
>>>
>>> This all sounds good. Looking forward to the next webrev.
>>>
>>> Thanks,
>>> Bengt
>>>
>>> On 2015-12-02 19:08, sangheon.kim wrote:
>>>> Hi Bengt,
>>>>
>>>>
>>>> On 12/02/2015 01:00 AM, Bengt Rutisson wrote:
>>>>>
>>>>> Hi Sangheon,
>>>>>
>>>>> Sorry again for the late reply. Comments inline.
>>>> No not at all.
>>>> Thank you for taking time for this review!
>>>>
>>>>>
>>>>> On 2015-11-24 19:48, sangheon.kim wrote:
>>>>>> Hi Bengt,
>>>>>>
>>>>>> Thank you so much for reviewing this patch!
>>>>>>
>>>>>> On 11/24/2015 05:12 AM, Bengt Rutisson wrote:
>>>>>>>
>>>>>>> Hi Sangheon,
>>>>>>>
>>>>>>> Sorry for the very late reply to this review request.
>>>>>>>
>>>>>>> On 2015-10-22 01:40, sangheon.kim wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Can I get some reviews for this change of adding a trace event 
>>>>>>>> for concurrent phases on CMS and G1?
>>>>>>>>
>>>>>>>> Currently we only measure pause times for major phases.
>>>>>>>> But I want to add 'concurrent mark' and 'concurrent sweep' 
>>>>>>>> phases for CMS and 'concurrent mark' phase for G1.
>>>>>>>> To achieve this, I had to change ConcurrentGCTimer and related 
>>>>>>>> classes.
>>>>>>>>
>>>>>>>> This patch includes:
>>>>>>>> 1) A concurrent phase can be measured only from 
>>>>>>>> ConcurrentGCTimer and measuring an overlap between concurrent 
>>>>>>>> and pause is not allowed as currently we don't have that use 
>>>>>>>> case. And TimePartitions class(or related classes) will be 
>>>>>>>> simpler than an overlap allowed case.
>>>>>>>> 2) I removed PausePhase and ConcurrentPhase which are derived 
>>>>>>>> from GCPhase because I wanted to avoid heap allocation when 
>>>>>>>> adding to GrowableArray. Instead introduced 'type' member 
>>>>>>>> variable at GCPhase.
>>>>>>>>
>>>>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8068394
>>>>>>>> Webrev: http://cr.openjdk.java.net/~sangheki/8068394/webrev.00/
>>>>>>>
>>>>>>> Nice work! It is great to get some timing information for the 
>>>>>>> concurrent phases.
>>>>>> Thanks!
>>>>>>
>>>>>>>
>>>>>>> A few questions/comments:
>>>>>>>
>>>>>>> CMS.
>>>>>>>
>>>>>>> You have added timing events for the concurrent phases mark and 
>>>>>>> sweep, but not for other concurrent phases (preclean, 
>>>>>>> abortable_preclean and reset_concurrent). I think that if you 
>>>>>>> moved your calls to _gc_timer_cm->register_gc_concurrent_start() 
>>>>>>> and _gc_timer_cm->register_gc_concurrent_end() into the 
>>>>>>> constructor and destructor of CMSPhaseAccounting you would 
>>>>>>> automatically get timing events for all concurrent phases.
>>>>>> I also considered about this idea but I was not clear whether it 
>>>>>> is good measurement especially for 'sweep'.
>>>>>> We are using CMSPhaseAccounting for 5 cases.
>>>>>> I think for 'mark' and 'preclean' they are okay. (there are 
>>>>>> non-product functions call before CMSPhaseAccounting. But they 
>>>>>> are okay).
>>>>>> For 'abortable-preclean' and 'reset', they are good to have that.
>>>>>> But when it comes to 'sweep', there are timer and 
>>>>>> CMSTokenSyncWithLocks related codes and I was not sure about these.
>>>>>> If you think they are also okay, I will change as you suggested.
>>>>>
>>>>> I think it is fine for the sweep part as well. If the 
>>>>> CMSPhaseAccounting instance is not measuring the correct part of 
>>>>> the work I think we should change it to measure what we want. That 
>>>>> way we improve the existing measurements too.
>>>>>
>>>>> But as you stated in your follow up email, I think the sweep code 
>>>>> is actually pretty much ok from this perspective.
>>>>>
>>>>> So, I would prefer to go with the CMSPhaseAccounting approach.
>>>> Okay, I already changed as you suggested. :)
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> G1.
>>>>>>>
>>>>>>> I think the use of _concurrent_marking_from_roots is 
>>>>>>> unfortunate. It would be much cleaner if 
>>>>>>> ConcurrentMark::register_mark_from_roots_phase_end() called 
>>>>>>> register_gc_concurrent_end() directly. 
>>>>>> I wanted to avoid introducing the new flag but the reason of it 
>>>>>> was, as you know, to handle 'abort'.
>>>>>> When 'abort' happens we do need to end all timers before calling 
>>>>>> 'GCTracer::report_gc_end()'.
>>>>>> And at this point we need a way to know whether concurrent timer 
>>>>>> is started or not.
>>>>>
>>>>> As I tried to describe I think you can extend the scope of the 
>>>>> timing to cover the part until the marking thread discovers that 
>>>>> the marking was aborted. Basically moving the timing in to 
>>>>> ConcurrentMarkThread::run_service().
>>>> Oh, I mis-understood your point. Sorry.
>>>> I will try your recommendation.
>>>>
>>>>>
>>>>>>
>>>>>>> I realize that this would change the timing for when a 
>>>>>>> concurrent marking is aborted. The whole time for a full GC (or 
>>>>>>> even several full GCs) would be included in the concurrent 
>>>>>>> marking phase. But from a code perspective that is what happens, 
>>>>>>> so maybe that is the correct time to report? Also, I think the 
>>>>>>> logging is reported that way so if we want to make it easy to 
>>>>>>> match the timing events with the logging we might want to use 
>>>>>>> about the same scope for timing.
>>>>>>>
>>>>>>> Why do we only measure the concurrent mark phase for G1? 
>>>>>> This is good question. :)
>>>>>> I was comparing CMS and G1 for major concurrent phases.
>>>>>> And the candidates were 'concurrent mark/sweep (CMS)' and 
>>>>>> 'concurrent mark(G1)'.
>>>>>>
>>>>>>> Wouldn't it be good to measure all concurrent phases?
>>>>>> Okay, Stefan Karlsson also pointed to have them as well so I 
>>>>>> filed a separate CR for this.
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143951
>>>>>
>>>>> Ok. Is there a reason why we want to split this up into two 
>>>>> changes? Why not add all the concurrent timing events at once?
>>>>>
>>>>> I'm ok with splitting it up into two changes, but then maybe the 
>>>>> title for JDK-8068394 should be changed to indicate that it only 
>>>>> adds events for the marking phase. Also, if we go with the 
>>>>> CMSPhaseAccounting approach for CMS, you get events for all phases 
>>>>> there. So, it will be a bit odd to have all concurrent phases 
>>>>> handled for CMS but only concurrent mark for G1.
>>>> You are correct and I agree with you.
>>>> The only reason of split was due to limited time for FC.
>>>> But as we have enough time, let me close JDK-8143951 and include 
>>>> concurrent mark for G1 as well.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> gcTraceSend.cpp
>>>>>>>
>>>>>>>  389   void visit(GCPhase* phase) {
>>>>>>>  390     switch (phase->type()) {
>>>>>>>  391       case GCPhase::PausePhaseType:
>>>>>>>  392         assert(phase->level() < PhasesStack::PHASE_LEVELS, 
>>>>>>> "Need more event types for PausePhase");
>>>>>>>  393
>>>>>>>  394         switch (phase->level()) {
>>>>>>>  395           case 0: send_phase<EventGCPhasePause>(phase); break;
>>>>>>>  396           case 1: 
>>>>>>> send_phase<EventGCPhasePauseLevel1>(phase); break;
>>>>>>>  397           case 2: 
>>>>>>> send_phase<EventGCPhasePauseLevel2>(phase); break;
>>>>>>>  398           case 3: 
>>>>>>> send_phase<EventGCPhasePauseLevel3>(phase); break;
>>>>>>>  399           default: /* Ignore sending this phase */ break;
>>>>>>>  400         }
>>>>>>>  401         break;
>>>>>>>  402
>>>>>>>  403       case GCPhase::ConcurrentPhaseType:
>>>>>>>  404         assert(phase->level() < 1, "There's only one level 
>>>>>>> for ConcurrentPhase");
>>>>>>>  405
>>>>>>>  406         switch (phase->level()) {
>>>>>>>  407           case 0: 
>>>>>>> send_phase<EventGCPhaseConcurrent>(phase); break;
>>>>>>>  408           default: /* Ignore sending this phase */ break;
>>>>>>>  409         }
>>>>>>>  410         break;
>>>>>>>  411     }
>>>>>>>  412   }
>>>>>>>  413 };
>>>>>>>
>>>>>>> Since there are only two values for GCPhase::PhaseType it seems 
>>>>>>> a bit odd to use a switch statement. I think I would prefer to 
>>>>>>> factor out the code for the different types a bit too. So, maybe 
>>>>>>> something like:
>>>>>>>
>>>>>>> void visit(GCPhase* phase) {
>>>>>>>   if (phase->type() == GCPhase::PausePhaseType) {
>>>>>>>     visit_pause(phase);
>>>>>>>   } else {
>>>>>>>     assert(phase->type() == GCPhase::ConcurrentPhaseType, "");
>>>>>>>     visit_concurrent(phase);
>>>>>>>   }
>>>>>>> }
>>>>>> This seems better. I will fix.
>>>>>
>>>>> Good. Thanks!
>>>> :)
>>>>
>>>> I will post next webrev when I'm ready, including JDK-8143951 part 
>>>> as well.
>>>>
>>>> Thanks,
>>>> Sangheon
>>>>
>>>>
>>>>>
>>>>> Bengt
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Sangheon
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bengt
>>>>>>>
>>>>>>>> Testing: JPRT
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sangheon
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>