RFR (M): JDK-8068394: Trace event for concurrent GC phases

Stefan Karlsson stefan.karlsson at oracle.com
Tue Dec 15 19:32:23 UTC 2015


Hi Sangheon,

On 2015-12-15 20:29, sangheon wrote:
> Hi Stefan,
>
> Thank you for looking at official RFR as well.
>
> On 12/15/2015 12:58 AM, Stefan Karlsson wrote:
>> Hi Sangheon,
>>
>> On 2015-12-15 07:27, sangheon wrote:
>>> Hi Bengt and all,
>>>
>>> Here's a next webrev.
>>> Bengt and I had a discussion on other thread and he and Stefan 
>>> Karlsson suggested simpler design.
>>>
>>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.01
>>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.01_to_00
>>
>> Thanks for doing the simplifications of the patch.
>>
>> I have a couple of comments:
>> Could you replace NOT_PRODUCT with DEBUG_ONLY, and #ifndef PRODUCT 
>> with #ifdef ASSERT. Otherwise, you'll get inconsistencies when 
>> building with optimized builds where neither PRODUCT nor ASSERT is 
>> defined.
> Okay, I will do.
> But you also pointed that whether this flag is really needed below.
>
>>
>> 81 void ConcurrentGCTimer::register_gc_concurrent_start(const char* 
>> name, const Ticks& time) {
>> 82 assert(!_is_concurrent_phase_active, "A concurrent phase is 
>> already active.");
>> 83 _time_partitions.report_gc_phase_start(name, time, 
>> GCPhase::ConcurrentPhaseType);
>> 84 NOT_PRODUCT(_is_concurrent_phase_active = true;)
>> 85 }
>>
>>  162 class ConcurrentGCTimer : public GCTimer {
>> 163 // ConcurrentGCTimer can't be used if there is an overlap between 
>> a pause phase and a concurrent phase.
>> 164 // _is_concurrent_phase_active is used to find above case.
>> 165 NOT_PRODUCT(volatile bool _is_concurrent_phase_active;)
>> 166
>>  167  public:
>>  168 #ifndef PRODUCT
>> 169 ConcurrentGCTimer(): GCTimer(), 
>> _is_concurrent_phase_active(false) {};
>> 170 #endif Is it really worth optimizing away the boolean? The code 
>> would be cleaner without these macros.
> I just wanted to prevent mis-use of concurrent timer for overlapping 
> phase.
> A pause phase nor another concurrent phase can't be started if a 
> concurrent phase is already started.
> I think this would be nice to have. But I don't have a strong opinion 
> on this as currently we don't have these situations.

I meant that we could keep the the variable in the product builds as 
well. Just to get rid of the noisy PRODUCT / DEBUG_ONLY macros. It could 
even be useful for future debugging. ;)
>
>>
>>
>> Why are you using volatile here? I don't see why that should be needed.
>>
>>  162 class ConcurrentGCTimer : public GCTimer {
>>  163 // ConcurrentGCTimer can't be used if there is an overlap 
>> between a pause phase and a concurrent phase.
>> 164 // _is_concurrent_phase_active is used to find above case.
>> 165 NOT_PRODUCT(volatile bool _is_concurrent_phase_active;)
> You are right, volatile is not needed here.
> I was thinking register_gc_concurrent_end() would be called from 
> different threads(ConcurrentMarkThread and VMThread), but as it only 
> happens from safepoint, I think volatile should be removed.
>
> I will post a new webrev with your opinion about the flag.

Great.

Thanks,
StefanK

>
> Thanks,
> Sangheon
>
>
>>
>>
>> Thanks,
>> StefanK
>>
>>>
>>> This webrev includes:
>>> 1. Moving concurrent phase measurement into CMSPhaseAccounting 
>>> class. So we can check all concurrent phases from CMS.
>>>
>>> 2. Renamed a new flag introduced from webrev.00 for general use, 
>>> _concurrent_phase_started.
>>>
>>> 3. Added helper class for concurrent phase measurement, 
>>> GCConcPhaseTimer. And used this class for all concurrent phases.
>>>
>>> 4. Flag checking is moved from 
>>> G1CollectedHeap::register_concurrent_cycle_end() to 
>>> ConcurrentMark::abort().
>>>
>>> 5. Changed switch() to if else statement at gcTraceSend.
>>>
>>> Many thanks to Bengt and Stefan Karlsson for the discussion.
>>>
>>> Thanks,
>>> Sangheon
>>>
>>>
>>> On 12/03/2015 12:03 AM, Bengt Rutisson wrote:
>>>>
>>>> Hi Sangheon,
>>>>
>>>> This all sounds good. Looking forward to the next webrev.
>>>>
>>>> Thanks,
>>>> Bengt
>>>>
>>>> On 2015-12-02 19:08, sangheon.kim wrote:
>>>>> Hi Bengt,
>>>>>
>>>>>
>>>>> On 12/02/2015 01:00 AM, Bengt Rutisson wrote:
>>>>>>
>>>>>> Hi Sangheon,
>>>>>>
>>>>>> Sorry again for the late reply. Comments inline.
>>>>> No not at all.
>>>>> Thank you for taking time for this review!
>>>>>
>>>>>>
>>>>>> On 2015-11-24 19:48, sangheon.kim wrote:
>>>>>>> Hi Bengt,
>>>>>>>
>>>>>>> Thank you so much for reviewing this patch!
>>>>>>>
>>>>>>> On 11/24/2015 05:12 AM, Bengt Rutisson wrote:
>>>>>>>>
>>>>>>>> Hi Sangheon,
>>>>>>>>
>>>>>>>> Sorry for the very late reply to this review request.
>>>>>>>>
>>>>>>>> On 2015-10-22 01:40, sangheon.kim wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Can I get some reviews for this change of adding a trace event 
>>>>>>>>> for concurrent phases on CMS and G1?
>>>>>>>>>
>>>>>>>>> Currently we only measure pause times for major phases.
>>>>>>>>> But I want to add 'concurrent mark' and 'concurrent sweep' 
>>>>>>>>> phases for CMS and 'concurrent mark' phase for G1.
>>>>>>>>> To achieve this, I had to change ConcurrentGCTimer and related 
>>>>>>>>> classes.
>>>>>>>>>
>>>>>>>>> This patch includes:
>>>>>>>>> 1) A concurrent phase can be measured only from 
>>>>>>>>> ConcurrentGCTimer and measuring an overlap between concurrent 
>>>>>>>>> and pause is not allowed as currently we don't have that use 
>>>>>>>>> case. And TimePartitions class(or related classes) will be 
>>>>>>>>> simpler than an overlap allowed case.
>>>>>>>>> 2) I removed PausePhase and ConcurrentPhase which are derived 
>>>>>>>>> from GCPhase because I wanted to avoid heap allocation when 
>>>>>>>>> adding to GrowableArray. Instead introduced 'type' member 
>>>>>>>>> variable at GCPhase.
>>>>>>>>>
>>>>>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8068394
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~sangheki/8068394/webrev.00/
>>>>>>>>
>>>>>>>> Nice work! It is great to get some timing information for the 
>>>>>>>> concurrent phases.
>>>>>>> Thanks!
>>>>>>>
>>>>>>>>
>>>>>>>> A few questions/comments:
>>>>>>>>
>>>>>>>> CMS.
>>>>>>>>
>>>>>>>> You have added timing events for the concurrent phases mark and 
>>>>>>>> sweep, but not for other concurrent phases (preclean, 
>>>>>>>> abortable_preclean and reset_concurrent). I think that if you 
>>>>>>>> moved your calls to 
>>>>>>>> _gc_timer_cm->register_gc_concurrent_start() and 
>>>>>>>> _gc_timer_cm->register_gc_concurrent_end() into the constructor 
>>>>>>>> and destructor of CMSPhaseAccounting you would automatically 
>>>>>>>> get timing events for all concurrent phases.
>>>>>>> I also considered about this idea but I was not clear whether it 
>>>>>>> is good measurement especially for 'sweep'.
>>>>>>> We are using CMSPhaseAccounting for 5 cases.
>>>>>>> I think for 'mark' and 'preclean' they are okay. (there are 
>>>>>>> non-product functions call before CMSPhaseAccounting. But they 
>>>>>>> are okay).
>>>>>>> For 'abortable-preclean' and 'reset', they are good to have that.
>>>>>>> But when it comes to 'sweep', there are timer and 
>>>>>>> CMSTokenSyncWithLocks related codes and I was not sure about these.
>>>>>>> If you think they are also okay, I will change as you suggested.
>>>>>>
>>>>>> I think it is fine for the sweep part as well. If the 
>>>>>> CMSPhaseAccounting instance is not measuring the correct part of 
>>>>>> the work I think we should change it to measure what we want. 
>>>>>> That way we improve the existing measurements too.
>>>>>>
>>>>>> But as you stated in your follow up email, I think the sweep code 
>>>>>> is actually pretty much ok from this perspective.
>>>>>>
>>>>>> So, I would prefer to go with the CMSPhaseAccounting approach.
>>>>> Okay, I already changed as you suggested. :)
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> G1.
>>>>>>>>
>>>>>>>> I think the use of _concurrent_marking_from_roots is 
>>>>>>>> unfortunate. It would be much cleaner if 
>>>>>>>> ConcurrentMark::register_mark_from_roots_phase_end() called 
>>>>>>>> register_gc_concurrent_end() directly. 
>>>>>>> I wanted to avoid introducing the new flag but the reason of it 
>>>>>>> was, as you know, to handle 'abort'.
>>>>>>> When 'abort' happens we do need to end all timers before calling 
>>>>>>> 'GCTracer::report_gc_end()'.
>>>>>>> And at this point we need a way to know whether concurrent timer 
>>>>>>> is started or not.
>>>>>>
>>>>>> As I tried to describe I think you can extend the scope of the 
>>>>>> timing to cover the part until the marking thread discovers that 
>>>>>> the marking was aborted. Basically moving the timing in to 
>>>>>> ConcurrentMarkThread::run_service().
>>>>> Oh, I mis-understood your point. Sorry.
>>>>> I will try your recommendation.
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> I realize that this would change the timing for when a 
>>>>>>>> concurrent marking is aborted. The whole time for a full GC (or 
>>>>>>>> even several full GCs) would be included in the concurrent 
>>>>>>>> marking phase. But from a code perspective that is what 
>>>>>>>> happens, so maybe that is the correct time to report? Also, I 
>>>>>>>> think the logging is reported that way so if we want to make it 
>>>>>>>> easy to match the timing events with the logging we might want 
>>>>>>>> to use about the same scope for timing.
>>>>>>>>
>>>>>>>> Why do we only measure the concurrent mark phase for G1? 
>>>>>>> This is good question. :)
>>>>>>> I was comparing CMS and G1 for major concurrent phases.
>>>>>>> And the candidates were 'concurrent mark/sweep (CMS)' and 
>>>>>>> 'concurrent mark(G1)'.
>>>>>>>
>>>>>>>> Wouldn't it be good to measure all concurrent phases?
>>>>>>> Okay, Stefan Karlsson also pointed to have them as well so I 
>>>>>>> filed a separate CR for this.
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143951
>>>>>>
>>>>>> Ok. Is there a reason why we want to split this up into two 
>>>>>> changes? Why not add all the concurrent timing events at once?
>>>>>>
>>>>>> I'm ok with splitting it up into two changes, but then maybe the 
>>>>>> title for JDK-8068394 should be changed to indicate that it only 
>>>>>> adds events for the marking phase. Also, if we go with the 
>>>>>> CMSPhaseAccounting approach for CMS, you get events for all 
>>>>>> phases there. So, it will be a bit odd to have all concurrent 
>>>>>> phases handled for CMS but only concurrent mark for G1.
>>>>> You are correct and I agree with you.
>>>>> The only reason of split was due to limited time for FC.
>>>>> But as we have enough time, let me close JDK-8143951 and include 
>>>>> concurrent mark for G1 as well.
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> gcTraceSend.cpp
>>>>>>>>
>>>>>>>>  389   void visit(GCPhase* phase) {
>>>>>>>>  390     switch (phase->type()) {
>>>>>>>>  391       case GCPhase::PausePhaseType:
>>>>>>>>  392         assert(phase->level() < PhasesStack::PHASE_LEVELS, 
>>>>>>>> "Need more event types for PausePhase");
>>>>>>>>  393
>>>>>>>>  394         switch (phase->level()) {
>>>>>>>>  395           case 0: send_phase<EventGCPhasePause>(phase); 
>>>>>>>> break;
>>>>>>>>  396           case 1: 
>>>>>>>> send_phase<EventGCPhasePauseLevel1>(phase); break;
>>>>>>>>  397           case 2: 
>>>>>>>> send_phase<EventGCPhasePauseLevel2>(phase); break;
>>>>>>>>  398           case 3: 
>>>>>>>> send_phase<EventGCPhasePauseLevel3>(phase); break;
>>>>>>>>  399           default: /* Ignore sending this phase */ break;
>>>>>>>>  400         }
>>>>>>>>  401         break;
>>>>>>>>  402
>>>>>>>>  403       case GCPhase::ConcurrentPhaseType:
>>>>>>>>  404         assert(phase->level() < 1, "There's only one level 
>>>>>>>> for ConcurrentPhase");
>>>>>>>>  405
>>>>>>>>  406         switch (phase->level()) {
>>>>>>>>  407           case 0: 
>>>>>>>> send_phase<EventGCPhaseConcurrent>(phase); break;
>>>>>>>>  408           default: /* Ignore sending this phase */ break;
>>>>>>>>  409         }
>>>>>>>>  410         break;
>>>>>>>>  411     }
>>>>>>>>  412   }
>>>>>>>>  413 };
>>>>>>>>
>>>>>>>> Since there are only two values for GCPhase::PhaseType it seems 
>>>>>>>> a bit odd to use a switch statement. I think I would prefer to 
>>>>>>>> factor out the code for the different types a bit too. So, 
>>>>>>>> maybe something like:
>>>>>>>>
>>>>>>>> void visit(GCPhase* phase) {
>>>>>>>>   if (phase->type() == GCPhase::PausePhaseType) {
>>>>>>>>     visit_pause(phase);
>>>>>>>>   } else {
>>>>>>>>     assert(phase->type() == GCPhase::ConcurrentPhaseType, "");
>>>>>>>>     visit_concurrent(phase);
>>>>>>>>   }
>>>>>>>> }
>>>>>>> This seems better. I will fix.
>>>>>>
>>>>>> Good. Thanks!
>>>>> :)
>>>>>
>>>>> I will post next webrev when I'm ready, including JDK-8143951 part 
>>>>> as well.
>>>>>
>>>>> Thanks,
>>>>> Sangheon
>>>>>
>>>>>
>>>>>>
>>>>>> Bengt
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sangheon
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Bengt
>>>>>>>>
>>>>>>>>> Testing: JPRT
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sangheon
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>




More information about the hotspot-gc-dev mailing list