RFR (M): JDK-8068394: Trace event for concurrent GC phases

sangheon sangheon.kim at oracle.com
Thu Dec 17 18:27:12 UTC 2015


Hi Bengt and Stefan,

Thank you for reviewing this.
And all the suggestions and advices.

I will a sponsor this patch, anyone could help me?

Thanks,
Sangheon


On 12/17/2015 12:34 AM, Bengt Rutisson wrote:
>
> Hi Sangheon,
>
> On 2015-12-15 23:18, sangheon wrote:
>> Hi Stefan,
>>
>> Here's the next webrev which includes your suggestion.
>> - Removed 'NOT_PRODUCT' and '#ifndef PRODUCT'. (Stefan)
>> - Removed volatile keyword at 
>> ConcurrentGCTimer::_is_concurrent_phase_active. (Stefan)
>> - Removed volatile keyword at 
>> ConcurrentMark::_concurrent_phase_started. This should be removed as 
>> well.
>>
>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.02
>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.02_to_01/
>
> This looks good to me.
>
> Thanks,
> Bengt
>
>>
>> Thanks,
>> Sangheon
>>
>>
>> On 12/15/2015 11:37 AM, sangheon wrote:
>>> Hi Stefan,
>>>
>>> On 12/15/2015 11:32 AM, Stefan Karlsson wrote:
>>>> Hi Sangheon,
>>>>
>>>> On 2015-12-15 20:29, sangheon wrote:
>>>>> Hi Stefan,
>>>>>
>>>>> Thank you for looking at official RFR as well.
>>>>>
>>>>> On 12/15/2015 12:58 AM, Stefan Karlsson wrote:
>>>>>> Hi Sangheon,
>>>>>>
>>>>>> On 2015-12-15 07:27, sangheon wrote:
>>>>>>> Hi Bengt and all,
>>>>>>>
>>>>>>> Here's a next webrev.
>>>>>>> Bengt and I had a discussion on other thread and he and Stefan 
>>>>>>> Karlsson suggested simpler design.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.01
>>>>>>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.01_to_00
>>>>>>
>>>>>> Thanks for doing the simplifications of the patch.
>>>>>>
>>>>>> I have a couple of comments:
>>>>>> Could you replace NOT_PRODUCT with DEBUG_ONLY, and #ifndef 
>>>>>> PRODUCT with #ifdef ASSERT. Otherwise, you'll get inconsistencies 
>>>>>> when building with optimized builds where neither PRODUCT nor 
>>>>>> ASSERT is defined.
>>>>> Okay, I will do.
>>>>> But you also pointed that whether this flag is really needed below.
>>>>>
>>>>>>
>>>>>> 81 void ConcurrentGCTimer::register_gc_concurrent_start(const 
>>>>>> char* name, const Ticks& time) {
>>>>>> 82 assert(!_is_concurrent_phase_active, "A concurrent phase is 
>>>>>> already active.");
>>>>>> 83 _time_partitions.report_gc_phase_start(name, time, 
>>>>>> GCPhase::ConcurrentPhaseType);
>>>>>> 84 NOT_PRODUCT(_is_concurrent_phase_active = true;)
>>>>>> 85 }
>>>>>>
>>>>>>  162 class ConcurrentGCTimer : public GCTimer {
>>>>>> 163 // ConcurrentGCTimer can't be used if there is an overlap 
>>>>>> between a pause phase and a concurrent phase.
>>>>>> 164 // _is_concurrent_phase_active is used to find above case.
>>>>>> 165 NOT_PRODUCT(volatile bool _is_concurrent_phase_active;)
>>>>>> 166
>>>>>>  167  public:
>>>>>>  168 #ifndef PRODUCT
>>>>>> 169 ConcurrentGCTimer(): GCTimer(), 
>>>>>> _is_concurrent_phase_active(false) {};
>>>>>> 170 #endif Is it really worth optimizing away the boolean? The 
>>>>>> code would be cleaner without these macros.
>>>>> I just wanted to prevent mis-use of concurrent timer for 
>>>>> overlapping phase.
>>>>> A pause phase nor another concurrent phase can't be started if a 
>>>>> concurrent phase is already started.
>>>>> I think this would be nice to have. But I don't have a strong 
>>>>> opinion on this as currently we don't have these situations.
>>>>
>>>> I meant that we could keep the the variable in the product builds 
>>>> as well. Just to get rid of the noisy PRODUCT / DEBUG_ONLY macros. 
>>>> It could even be useful for future debugging. ;)
>>> Oops!
>>> I am okay with removing this macro.
>>>
>>> Let me post a new webrev soon.
>>> Thank you for the real time answer!
>>>
>>> Sangheon
>>>
>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> Why are you using volatile here? I don't see why that should be 
>>>>>> needed.
>>>>>>
>>>>>>  162 class ConcurrentGCTimer : public GCTimer {
>>>>>>  163 // ConcurrentGCTimer can't be used if there is an overlap 
>>>>>> between a pause phase and a concurrent phase.
>>>>>> 164 // _is_concurrent_phase_active is used to find above case.
>>>>>> 165 NOT_PRODUCT(volatile bool _is_concurrent_phase_active;)
>>>>> You are right, volatile is not needed here.
>>>>> I was thinking register_gc_concurrent_end() would be called from 
>>>>> different threads(ConcurrentMarkThread and VMThread), but as it 
>>>>> only happens from safepoint, I think volatile should be removed.
>>>>>
>>>>> I will post a new webrev with your opinion about the flag.
>>>>
>>>> Great.
>>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>>>
>>>>> Thanks,
>>>>> Sangheon
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> StefanK
>>>>>>
>>>>>>>
>>>>>>> This webrev includes:
>>>>>>> 1. Moving concurrent phase measurement into CMSPhaseAccounting 
>>>>>>> class. So we can check all concurrent phases from CMS.
>>>>>>>
>>>>>>> 2. Renamed a new flag introduced from webrev.00 for general use, 
>>>>>>> _concurrent_phase_started.
>>>>>>>
>>>>>>> 3. Added helper class for concurrent phase measurement, 
>>>>>>> GCConcPhaseTimer. And used this class for all concurrent phases.
>>>>>>>
>>>>>>> 4. Flag checking is moved from 
>>>>>>> G1CollectedHeap::register_concurrent_cycle_end() to 
>>>>>>> ConcurrentMark::abort().
>>>>>>>
>>>>>>> 5. Changed switch() to if else statement at gcTraceSend.
>>>>>>>
>>>>>>> Many thanks to Bengt and Stefan Karlsson for the discussion.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sangheon
>>>>>>>
>>>>>>>
>>>>>>> On 12/03/2015 12:03 AM, Bengt Rutisson wrote:
>>>>>>>>
>>>>>>>> Hi Sangheon,
>>>>>>>>
>>>>>>>> This all sounds good. Looking forward to the next webrev.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Bengt
>>>>>>>>
>>>>>>>> On 2015-12-02 19:08, sangheon.kim wrote:
>>>>>>>>> Hi Bengt,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/02/2015 01:00 AM, Bengt Rutisson wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Sangheon,
>>>>>>>>>>
>>>>>>>>>> Sorry again for the late reply. Comments inline.
>>>>>>>>> No not at all.
>>>>>>>>> Thank you for taking time for this review!
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2015-11-24 19:48, sangheon.kim wrote:
>>>>>>>>>>> Hi Bengt,
>>>>>>>>>>>
>>>>>>>>>>> Thank you so much for reviewing this patch!
>>>>>>>>>>>
>>>>>>>>>>> On 11/24/2015 05:12 AM, Bengt Rutisson wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Sangheon,
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry for the very late reply to this review request.
>>>>>>>>>>>>
>>>>>>>>>>>> On 2015-10-22 01:40, sangheon.kim wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can I get some reviews for this change of adding a trace 
>>>>>>>>>>>>> event for concurrent phases on CMS and G1?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Currently we only measure pause times for major phases.
>>>>>>>>>>>>> But I want to add 'concurrent mark' and 'concurrent sweep' 
>>>>>>>>>>>>> phases for CMS and 'concurrent mark' phase for G1.
>>>>>>>>>>>>> To achieve this, I had to change ConcurrentGCTimer and 
>>>>>>>>>>>>> related classes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This patch includes:
>>>>>>>>>>>>> 1) A concurrent phase can be measured only from 
>>>>>>>>>>>>> ConcurrentGCTimer and measuring an overlap between 
>>>>>>>>>>>>> concurrent and pause is not allowed as currently we don't 
>>>>>>>>>>>>> have that use case. And TimePartitions class(or related 
>>>>>>>>>>>>> classes) will be simpler than an overlap allowed case.
>>>>>>>>>>>>> 2) I removed PausePhase and ConcurrentPhase which are 
>>>>>>>>>>>>> derived from GCPhase because I wanted to avoid heap 
>>>>>>>>>>>>> allocation when adding to GrowableArray. Instead 
>>>>>>>>>>>>> introduced 'type' member variable at GCPhase.
>>>>>>>>>>>>>
>>>>>>>>>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8068394
>>>>>>>>>>>>> Webrev: 
>>>>>>>>>>>>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.00/
>>>>>>>>>>>>
>>>>>>>>>>>> Nice work! It is great to get some timing information for 
>>>>>>>>>>>> the concurrent phases.
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> A few questions/comments:
>>>>>>>>>>>>
>>>>>>>>>>>> CMS.
>>>>>>>>>>>>
>>>>>>>>>>>> You have added timing events for the concurrent phases mark 
>>>>>>>>>>>> and sweep, but not for other concurrent phases (preclean, 
>>>>>>>>>>>> abortable_preclean and reset_concurrent). I think that if 
>>>>>>>>>>>> you moved your calls to 
>>>>>>>>>>>> _gc_timer_cm->register_gc_concurrent_start() and 
>>>>>>>>>>>> _gc_timer_cm->register_gc_concurrent_end() into the 
>>>>>>>>>>>> constructor and destructor of CMSPhaseAccounting you would 
>>>>>>>>>>>> automatically get timing events for all concurrent phases.
>>>>>>>>>>> I also considered about this idea but I was not clear 
>>>>>>>>>>> whether it is good measurement especially for 'sweep'.
>>>>>>>>>>> We are using CMSPhaseAccounting for 5 cases.
>>>>>>>>>>> I think for 'mark' and 'preclean' they are okay. (there are 
>>>>>>>>>>> non-product functions call before CMSPhaseAccounting. But 
>>>>>>>>>>> they are okay).
>>>>>>>>>>> For 'abortable-preclean' and 'reset', they are good to have 
>>>>>>>>>>> that.
>>>>>>>>>>> But when it comes to 'sweep', there are timer and 
>>>>>>>>>>> CMSTokenSyncWithLocks related codes and I was not sure about 
>>>>>>>>>>> these.
>>>>>>>>>>> If you think they are also okay, I will change as you 
>>>>>>>>>>> suggested.
>>>>>>>>>>
>>>>>>>>>> I think it is fine for the sweep part as well. If the 
>>>>>>>>>> CMSPhaseAccounting instance is not measuring the correct part 
>>>>>>>>>> of the work I think we should change it to measure what we 
>>>>>>>>>> want. That way we improve the existing measurements too.
>>>>>>>>>>
>>>>>>>>>> But as you stated in your follow up email, I think the sweep 
>>>>>>>>>> code is actually pretty much ok from this perspective.
>>>>>>>>>>
>>>>>>>>>> So, I would prefer to go with the CMSPhaseAccounting approach.
>>>>>>>>> Okay, I already changed as you suggested. :)
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> G1.
>>>>>>>>>>>>
>>>>>>>>>>>> I think the use of _concurrent_marking_from_roots is 
>>>>>>>>>>>> unfortunate. It would be much cleaner if 
>>>>>>>>>>>> ConcurrentMark::register_mark_from_roots_phase_end() called 
>>>>>>>>>>>> register_gc_concurrent_end() directly. 
>>>>>>>>>>> I wanted to avoid introducing the new flag but the reason of 
>>>>>>>>>>> it was, as you know, to handle 'abort'.
>>>>>>>>>>> When 'abort' happens we do need to end all timers before 
>>>>>>>>>>> calling 'GCTracer::report_gc_end()'.
>>>>>>>>>>> And at this point we need a way to know whether concurrent 
>>>>>>>>>>> timer is started or not.
>>>>>>>>>>
>>>>>>>>>> As I tried to describe I think you can extend the scope of 
>>>>>>>>>> the timing to cover the part until the marking thread 
>>>>>>>>>> discovers that the marking was aborted. Basically moving the 
>>>>>>>>>> timing in to ConcurrentMarkThread::run_service().
>>>>>>>>> Oh, I mis-understood your point. Sorry.
>>>>>>>>> I will try your recommendation.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I realize that this would change the timing for when a 
>>>>>>>>>>>> concurrent marking is aborted. The whole time for a full GC 
>>>>>>>>>>>> (or even several full GCs) would be included in the 
>>>>>>>>>>>> concurrent marking phase. But from a code perspective that 
>>>>>>>>>>>> is what happens, so maybe that is the correct time to 
>>>>>>>>>>>> report? Also, I think the logging is reported that way so 
>>>>>>>>>>>> if we want to make it easy to match the timing events with 
>>>>>>>>>>>> the logging we might want to use about the same scope for 
>>>>>>>>>>>> timing.
>>>>>>>>>>>>
>>>>>>>>>>>> Why do we only measure the concurrent mark phase for G1? 
>>>>>>>>>>> This is good question. :)
>>>>>>>>>>> I was comparing CMS and G1 for major concurrent phases.
>>>>>>>>>>> And the candidates were 'concurrent mark/sweep (CMS)' and 
>>>>>>>>>>> 'concurrent mark(G1)'.
>>>>>>>>>>>
>>>>>>>>>>>> Wouldn't it be good to measure all concurrent phases?
>>>>>>>>>>> Okay, Stefan Karlsson also pointed to have them as well so I 
>>>>>>>>>>> filed a separate CR for this.
>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143951
>>>>>>>>>>
>>>>>>>>>> Ok. Is there a reason why we want to split this up into two 
>>>>>>>>>> changes? Why not add all the concurrent timing events at once?
>>>>>>>>>>
>>>>>>>>>> I'm ok with splitting it up into two changes, but then maybe 
>>>>>>>>>> the title for JDK-8068394 should be changed to indicate that 
>>>>>>>>>> it only adds events for the marking phase. Also, if we go 
>>>>>>>>>> with the CMSPhaseAccounting approach for CMS, you get events 
>>>>>>>>>> for all phases there. So, it will be a bit odd to have all 
>>>>>>>>>> concurrent phases handled for CMS but only concurrent mark 
>>>>>>>>>> for G1.
>>>>>>>>> You are correct and I agree with you.
>>>>>>>>> The only reason of split was due to limited time for FC.
>>>>>>>>> But as we have enough time, let me close JDK-8143951 and 
>>>>>>>>> include concurrent mark for G1 as well.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> gcTraceSend.cpp
>>>>>>>>>>>>
>>>>>>>>>>>>  389   void visit(GCPhase* phase) {
>>>>>>>>>>>>  390     switch (phase->type()) {
>>>>>>>>>>>>  391       case GCPhase::PausePhaseType:
>>>>>>>>>>>>  392         assert(phase->level() < 
>>>>>>>>>>>> PhasesStack::PHASE_LEVELS, "Need more event types for 
>>>>>>>>>>>> PausePhase");
>>>>>>>>>>>>  393
>>>>>>>>>>>>  394         switch (phase->level()) {
>>>>>>>>>>>>  395           case 0: 
>>>>>>>>>>>> send_phase<EventGCPhasePause>(phase); break;
>>>>>>>>>>>>  396           case 1: 
>>>>>>>>>>>> send_phase<EventGCPhasePauseLevel1>(phase); break;
>>>>>>>>>>>>  397           case 2: 
>>>>>>>>>>>> send_phase<EventGCPhasePauseLevel2>(phase); break;
>>>>>>>>>>>>  398           case 3: 
>>>>>>>>>>>> send_phase<EventGCPhasePauseLevel3>(phase); break;
>>>>>>>>>>>>  399           default: /* Ignore sending this phase */ break;
>>>>>>>>>>>>  400         }
>>>>>>>>>>>>  401         break;
>>>>>>>>>>>>  402
>>>>>>>>>>>>  403       case GCPhase::ConcurrentPhaseType:
>>>>>>>>>>>>  404         assert(phase->level() < 1, "There's only one 
>>>>>>>>>>>> level for ConcurrentPhase");
>>>>>>>>>>>>  405
>>>>>>>>>>>>  406         switch (phase->level()) {
>>>>>>>>>>>>  407           case 0: 
>>>>>>>>>>>> send_phase<EventGCPhaseConcurrent>(phase); break;
>>>>>>>>>>>>  408           default: /* Ignore sending this phase */ break;
>>>>>>>>>>>>  409         }
>>>>>>>>>>>>  410         break;
>>>>>>>>>>>>  411     }
>>>>>>>>>>>>  412   }
>>>>>>>>>>>>  413 };
>>>>>>>>>>>>
>>>>>>>>>>>> Since there are only two values for GCPhase::PhaseType it 
>>>>>>>>>>>> seems a bit odd to use a switch statement. I think I would 
>>>>>>>>>>>> prefer to factor out the code for the different types a bit 
>>>>>>>>>>>> too. So, maybe something like:
>>>>>>>>>>>>
>>>>>>>>>>>> void visit(GCPhase* phase) {
>>>>>>>>>>>>   if (phase->type() == GCPhase::PausePhaseType) {
>>>>>>>>>>>>     visit_pause(phase);
>>>>>>>>>>>>   } else {
>>>>>>>>>>>>     assert(phase->type() == GCPhase::ConcurrentPhaseType, "");
>>>>>>>>>>>>     visit_concurrent(phase);
>>>>>>>>>>>>   }
>>>>>>>>>>>> }
>>>>>>>>>>> This seems better. I will fix.
>>>>>>>>>>
>>>>>>>>>> Good. Thanks!
>>>>>>>>> :)
>>>>>>>>>
>>>>>>>>> I will post next webrev when I'm ready, including JDK-8143951 
>>>>>>>>> part as well.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sangheon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Bengt
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Sangheon
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Bengt
>>>>>>>>>>>>
>>>>>>>>>>>>> Testing: JPRT
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Sangheon
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>




More information about the hotspot-gc-dev mailing list