RFR (M): JDK-8068394: Trace event for concurrent GC phases

Tue Dec 15 22:18:09 UTC 2015

Hi Stefan,

Here's the next webrev which includes your suggestion.
- Removed 'NOT_PRODUCT' and '#ifndef PRODUCT'. (Stefan)
- Removed volatile keyword at 
ConcurrentGCTimer::_is_concurrent_phase_active. (Stefan)
- Removed volatile keyword at ConcurrentMark::_concurrent_phase_started. 
This should be removed as well.

http://cr.openjdk.java.net/~sangheki/8068394/webrev.02
http://cr.openjdk.java.net/~sangheki/8068394/webrev.02_to_01/

Thanks,
Sangheon

On 12/15/2015 11:37 AM, sangheon wrote:
> Hi Stefan,
>
> On 12/15/2015 11:32 AM, Stefan Karlsson wrote:
>> Hi Sangheon,
>>
>> On 2015-12-15 20:29, sangheon wrote:
>>> Hi Stefan,
>>>
>>> Thank you for looking at official RFR as well.
>>>
>>> On 12/15/2015 12:58 AM, Stefan Karlsson wrote:
>>>> Hi Sangheon,
>>>>
>>>> On 2015-12-15 07:27, sangheon wrote:
>>>>> Hi Bengt and all,
>>>>>
>>>>> Here's a next webrev.
>>>>> Bengt and I had a discussion on other thread and he and Stefan 
>>>>> Karlsson suggested simpler design.
>>>>>
>>>>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.01
>>>>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.01_to_00
>>>>
>>>> Thanks for doing the simplifications of the patch.
>>>>
>>>> I have a couple of comments:
>>>> Could you replace NOT_PRODUCT with DEBUG_ONLY, and #ifndef PRODUCT 
>>>> with #ifdef ASSERT. Otherwise, you'll get inconsistencies when 
>>>> building with optimized builds where neither PRODUCT nor ASSERT is 
>>>> defined.
>>> Okay, I will do.
>>> But you also pointed that whether this flag is really needed below.
>>>
>>>>
>>>> 81 void ConcurrentGCTimer::register_gc_concurrent_start(const char* 
>>>> name, const Ticks& time) {
>>>> 82 assert(!_is_concurrent_phase_active, "A concurrent phase is 
>>>> already active.");
>>>> 83 _time_partitions.report_gc_phase_start(name, time, 
>>>> GCPhase::ConcurrentPhaseType);
>>>> 84 NOT_PRODUCT(_is_concurrent_phase_active = true;)
>>>> 85 }
>>>>
>>>>  162 class ConcurrentGCTimer : public GCTimer {
>>>> 163 // ConcurrentGCTimer can't be used if there is an overlap 
>>>> between a pause phase and a concurrent phase.
>>>> 164 // _is_concurrent_phase_active is used to find above case.
>>>> 165 NOT_PRODUCT(volatile bool _is_concurrent_phase_active;)
>>>> 166
>>>>  167  public:
>>>>  168 #ifndef PRODUCT
>>>> 169 ConcurrentGCTimer(): GCTimer(), 
>>>> _is_concurrent_phase_active(false) {};
>>>> 170 #endif Is it really worth optimizing away the boolean? The code 
>>>> would be cleaner without these macros.
>>> I just wanted to prevent mis-use of concurrent timer for overlapping 
>>> phase.
>>> A pause phase nor another concurrent phase can't be started if a 
>>> concurrent phase is already started.
>>> I think this would be nice to have. But I don't have a strong 
>>> opinion on this as currently we don't have these situations.
>>
>> I meant that we could keep the the variable in the product builds as 
>> well. Just to get rid of the noisy PRODUCT / DEBUG_ONLY macros. It 
>> could even be useful for future debugging. ;)
> Oops!
> I am okay with removing this macro.
>
> Let me post a new webrev soon.
> Thank you for the real time answer!
>
> Sangheon
>
>
>>>
>>>>
>>>>
>>>> Why are you using volatile here? I don't see why that should be 
>>>> needed.
>>>>
>>>>  162 class ConcurrentGCTimer : public GCTimer {
>>>>  163 // ConcurrentGCTimer can't be used if there is an overlap 
>>>> between a pause phase and a concurrent phase.
>>>> 164 // _is_concurrent_phase_active is used to find above case.
>>>> 165 NOT_PRODUCT(volatile bool _is_concurrent_phase_active;)
>>> You are right, volatile is not needed here.
>>> I was thinking register_gc_concurrent_end() would be called from 
>>> different threads(ConcurrentMarkThread and VMThread), but as it only 
>>> happens from safepoint, I think volatile should be removed.
>>>
>>> I will post a new webrev with your opinion about the flag.
>>
>> Great.
>>
>> Thanks,
>> StefanK
>>
>>>
>>> Thanks,
>>> Sangheon
>>>
>>>
>>>>
>>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>>>
>>>>> This webrev includes:
>>>>> 1. Moving concurrent phase measurement into CMSPhaseAccounting 
>>>>> class. So we can check all concurrent phases from CMS.
>>>>>
>>>>> 2. Renamed a new flag introduced from webrev.00 for general use, 
>>>>> _concurrent_phase_started.
>>>>>
>>>>> 3. Added helper class for concurrent phase measurement, 
>>>>> GCConcPhaseTimer. And used this class for all concurrent phases.
>>>>>
>>>>> 4. Flag checking is moved from 
>>>>> G1CollectedHeap::register_concurrent_cycle_end() to 
>>>>> ConcurrentMark::abort().
>>>>>
>>>>> 5. Changed switch() to if else statement at gcTraceSend.
>>>>>
>>>>> Many thanks to Bengt and Stefan Karlsson for the discussion.
>>>>>
>>>>> Thanks,
>>>>> Sangheon
>>>>>
>>>>>
>>>>> On 12/03/2015 12:03 AM, Bengt Rutisson wrote:
>>>>>>
>>>>>> Hi Sangheon,
>>>>>>
>>>>>> This all sounds good. Looking forward to the next webrev.
>>>>>>
>>>>>> Thanks,
>>>>>> Bengt
>>>>>>
>>>>>> On 2015-12-02 19:08, sangheon.kim wrote:
>>>>>>> Hi Bengt,
>>>>>>>
>>>>>>>
>>>>>>> On 12/02/2015 01:00 AM, Bengt Rutisson wrote:
>>>>>>>>
>>>>>>>> Hi Sangheon,
>>>>>>>>
>>>>>>>> Sorry again for the late reply. Comments inline.
>>>>>>> No not at all.
>>>>>>> Thank you for taking time for this review!
>>>>>>>
>>>>>>>>
>>>>>>>> On 2015-11-24 19:48, sangheon.kim wrote:
>>>>>>>>> Hi Bengt,
>>>>>>>>>
>>>>>>>>> Thank you so much for reviewing this patch!
>>>>>>>>>
>>>>>>>>> On 11/24/2015 05:12 AM, Bengt Rutisson wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Sangheon,
>>>>>>>>>>
>>>>>>>>>> Sorry for the very late reply to this review request.
>>>>>>>>>>
>>>>>>>>>> On 2015-10-22 01:40, sangheon.kim wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Can I get some reviews for this change of adding a trace 
>>>>>>>>>>> event for concurrent phases on CMS and G1?
>>>>>>>>>>>
>>>>>>>>>>> Currently we only measure pause times for major phases.
>>>>>>>>>>> But I want to add 'concurrent mark' and 'concurrent sweep' 
>>>>>>>>>>> phases for CMS and 'concurrent mark' phase for G1.
>>>>>>>>>>> To achieve this, I had to change ConcurrentGCTimer and 
>>>>>>>>>>> related classes.
>>>>>>>>>>>
>>>>>>>>>>> This patch includes:
>>>>>>>>>>> 1) A concurrent phase can be measured only from 
>>>>>>>>>>> ConcurrentGCTimer and measuring an overlap between 
>>>>>>>>>>> concurrent and pause is not allowed as currently we don't 
>>>>>>>>>>> have that use case. And TimePartitions class(or related 
>>>>>>>>>>> classes) will be simpler than an overlap allowed case.
>>>>>>>>>>> 2) I removed PausePhase and ConcurrentPhase which are 
>>>>>>>>>>> derived from GCPhase because I wanted to avoid heap 
>>>>>>>>>>> allocation when adding to GrowableArray. Instead introduced 
>>>>>>>>>>> 'type' member variable at GCPhase.
>>>>>>>>>>>
>>>>>>>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8068394
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~sangheki/8068394/webrev.00/
>>>>>>>>>>
>>>>>>>>>> Nice work! It is great to get some timing information for the 
>>>>>>>>>> concurrent phases.
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> A few questions/comments:
>>>>>>>>>>
>>>>>>>>>> CMS.
>>>>>>>>>>
>>>>>>>>>> You have added timing events for the concurrent phases mark 
>>>>>>>>>> and sweep, but not for other concurrent phases (preclean, 
>>>>>>>>>> abortable_preclean and reset_concurrent). I think that if you 
>>>>>>>>>> moved your calls to 
>>>>>>>>>> _gc_timer_cm->register_gc_concurrent_start() and 
>>>>>>>>>> _gc_timer_cm->register_gc_concurrent_end() into the 
>>>>>>>>>> constructor and destructor of CMSPhaseAccounting you would 
>>>>>>>>>> automatically get timing events for all concurrent phases.
>>>>>>>>> I also considered about this idea but I was not clear whether 
>>>>>>>>> it is good measurement especially for 'sweep'.
>>>>>>>>> We are using CMSPhaseAccounting for 5 cases.
>>>>>>>>> I think for 'mark' and 'preclean' they are okay. (there are 
>>>>>>>>> non-product functions call before CMSPhaseAccounting. But they 
>>>>>>>>> are okay).
>>>>>>>>> For 'abortable-preclean' and 'reset', they are good to have that.
>>>>>>>>> But when it comes to 'sweep', there are timer and 
>>>>>>>>> CMSTokenSyncWithLocks related codes and I was not sure about 
>>>>>>>>> these.
>>>>>>>>> If you think they are also okay, I will change as you suggested.
>>>>>>>>
>>>>>>>> I think it is fine for the sweep part as well. If the 
>>>>>>>> CMSPhaseAccounting instance is not measuring the correct part 
>>>>>>>> of the work I think we should change it to measure what we 
>>>>>>>> want. That way we improve the existing measurements too.
>>>>>>>>
>>>>>>>> But as you stated in your follow up email, I think the sweep 
>>>>>>>> code is actually pretty much ok from this perspective.
>>>>>>>>
>>>>>>>> So, I would prefer to go with the CMSPhaseAccounting approach.
>>>>>>> Okay, I already changed as you suggested. :)
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> G1.
>>>>>>>>>>
>>>>>>>>>> I think the use of _concurrent_marking_from_roots is 
>>>>>>>>>> unfortunate. It would be much cleaner if 
>>>>>>>>>> ConcurrentMark::register_mark_from_roots_phase_end() called 
>>>>>>>>>> register_gc_concurrent_end() directly. 
>>>>>>>>> I wanted to avoid introducing the new flag but the reason of 
>>>>>>>>> it was, as you know, to handle 'abort'.
>>>>>>>>> When 'abort' happens we do need to end all timers before 
>>>>>>>>> calling 'GCTracer::report_gc_end()'.
>>>>>>>>> And at this point we need a way to know whether concurrent 
>>>>>>>>> timer is started or not.
>>>>>>>>
>>>>>>>> As I tried to describe I think you can extend the scope of the 
>>>>>>>> timing to cover the part until the marking thread discovers 
>>>>>>>> that the marking was aborted. Basically moving the timing in to 
>>>>>>>> ConcurrentMarkThread::run_service().
>>>>>>> Oh, I mis-understood your point. Sorry.
>>>>>>> I will try your recommendation.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I realize that this would change the timing for when a 
>>>>>>>>>> concurrent marking is aborted. The whole time for a full GC 
>>>>>>>>>> (or even several full GCs) would be included in the 
>>>>>>>>>> concurrent marking phase. But from a code perspective that is 
>>>>>>>>>> what happens, so maybe that is the correct time to report? 
>>>>>>>>>> Also, I think the logging is reported that way so if we want 
>>>>>>>>>> to make it easy to match the timing events with the logging 
>>>>>>>>>> we might want to use about the same scope for timing.
>>>>>>>>>>
>>>>>>>>>> Why do we only measure the concurrent mark phase for G1? 
>>>>>>>>> This is good question. :)
>>>>>>>>> I was comparing CMS and G1 for major concurrent phases.
>>>>>>>>> And the candidates were 'concurrent mark/sweep (CMS)' and 
>>>>>>>>> 'concurrent mark(G1)'.
>>>>>>>>>
>>>>>>>>>> Wouldn't it be good to measure all concurrent phases?
>>>>>>>>> Okay, Stefan Karlsson also pointed to have them as well so I 
>>>>>>>>> filed a separate CR for this.
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143951
>>>>>>>>
>>>>>>>> Ok. Is there a reason why we want to split this up into two 
>>>>>>>> changes? Why not add all the concurrent timing events at once?
>>>>>>>>
>>>>>>>> I'm ok with splitting it up into two changes, but then maybe 
>>>>>>>> the title for JDK-8068394 should be changed to indicate that it 
>>>>>>>> only adds events for the marking phase. Also, if we go with the 
>>>>>>>> CMSPhaseAccounting approach for CMS, you get events for all 
>>>>>>>> phases there. So, it will be a bit odd to have all concurrent 
>>>>>>>> phases handled for CMS but only concurrent mark for G1.
>>>>>>> You are correct and I agree with you.
>>>>>>> The only reason of split was due to limited time for FC.
>>>>>>> But as we have enough time, let me close JDK-8143951 and include 
>>>>>>> concurrent mark for G1 as well.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> gcTraceSend.cpp
>>>>>>>>>>
>>>>>>>>>>  389   void visit(GCPhase* phase) {
>>>>>>>>>>  390     switch (phase->type()) {
>>>>>>>>>>  391       case GCPhase::PausePhaseType:
>>>>>>>>>>  392         assert(phase->level() < 
>>>>>>>>>> PhasesStack::PHASE_LEVELS, "Need more event types for 
>>>>>>>>>> PausePhase");
>>>>>>>>>>  393
>>>>>>>>>>  394         switch (phase->level()) {
>>>>>>>>>>  395           case 0: send_phase<EventGCPhasePause>(phase); 
>>>>>>>>>> break;
>>>>>>>>>>  396           case 1: 
>>>>>>>>>> send_phase<EventGCPhasePauseLevel1>(phase); break;
>>>>>>>>>>  397           case 2: 
>>>>>>>>>> send_phase<EventGCPhasePauseLevel2>(phase); break;
>>>>>>>>>>  398           case 3: 
>>>>>>>>>> send_phase<EventGCPhasePauseLevel3>(phase); break;
>>>>>>>>>>  399           default: /* Ignore sending this phase */ break;
>>>>>>>>>>  400         }
>>>>>>>>>>  401         break;
>>>>>>>>>>  402
>>>>>>>>>>  403       case GCPhase::ConcurrentPhaseType:
>>>>>>>>>>  404         assert(phase->level() < 1, "There's only one 
>>>>>>>>>> level for ConcurrentPhase");
>>>>>>>>>>  405
>>>>>>>>>>  406         switch (phase->level()) {
>>>>>>>>>>  407           case 0: 
>>>>>>>>>> send_phase<EventGCPhaseConcurrent>(phase); break;
>>>>>>>>>>  408           default: /* Ignore sending this phase */ break;
>>>>>>>>>>  409         }
>>>>>>>>>>  410         break;
>>>>>>>>>>  411     }
>>>>>>>>>>  412   }
>>>>>>>>>>  413 };
>>>>>>>>>>
>>>>>>>>>> Since there are only two values for GCPhase::PhaseType it 
>>>>>>>>>> seems a bit odd to use a switch statement. I think I would 
>>>>>>>>>> prefer to factor out the code for the different types a bit 
>>>>>>>>>> too. So, maybe something like:
>>>>>>>>>>
>>>>>>>>>> void visit(GCPhase* phase) {
>>>>>>>>>>   if (phase->type() == GCPhase::PausePhaseType) {
>>>>>>>>>>     visit_pause(phase);
>>>>>>>>>>   } else {
>>>>>>>>>>     assert(phase->type() == GCPhase::ConcurrentPhaseType, "");
>>>>>>>>>>     visit_concurrent(phase);
>>>>>>>>>>   }
>>>>>>>>>> }
>>>>>>>>> This seems better. I will fix.
>>>>>>>>
>>>>>>>> Good. Thanks!
>>>>>>> :)
>>>>>>>
>>>>>>> I will post next webrev when I'm ready, including JDK-8143951 
>>>>>>> part as well.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sangheon
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Bengt
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sangheon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Bengt
>>>>>>>>>>
>>>>>>>>>>> Testing: JPRT
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Sangheon
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>