RFR (M): JDK-8068394: Trace event for concurrent GC phases

Bengt Rutisson bengt.rutisson at oracle.com
Fri Dec 18 12:41:40 UTC 2015


Hi Sangheon,

There was a test failure when I ran JPRT:

http://sthjprt.uk.oracle.com/archives/2015/12/2015-12-18-095536.brutisso.8068394/logs/macosx_x64_10.9-fastdebug-c2-jdk_svc_sanity.log.FAILED.log


ACTION: main -- Failed. Execution failed: `main' threw exception: 
java.lang.Exception: Preset 'Continuous' doesn't configure event 
'http://www.oracle.com/hotspot/jvm/vm/gc/phases/concurrent' Preset 
'Profiling' doesn't configure event 
'http://www.oracle.com/hotspot/jvm/vm/gc/phases/concurrent'


I think you need to add your new event to:
jdk/test/closed/com/oracle/jfr/api/ConfigurationTest/TestDefaultPresets.java

I'll be on vacation next week. Not sure if I will check email. If you 
get the TestDefaultPresets test to pass you can still list me as a 
reviewer. But someone else needs to sponsor the push.

Thanks,
Bengt

On 2015-12-18 10:54, Bengt Rutisson wrote:
>
> Hi Sangheon,
>
> On 2015-12-17 19:27, sangheon wrote:
>> Hi Bengt and Stefan,
>>
>> Thank you for reviewing this.
>> And all the suggestions and advices.
>>
>> I will a sponsor this patch, anyone could help me?
>
> I'm pushing this now.
>
> Thanks,
> Bengt
>
>>
>> Thanks,
>> Sangheon
>>
>>
>> On 12/17/2015 12:34 AM, Bengt Rutisson wrote:
>>>
>>> Hi Sangheon,
>>>
>>> On 2015-12-15 23:18, sangheon wrote:
>>>> Hi Stefan,
>>>>
>>>> Here's the next webrev which includes your suggestion.
>>>> - Removed 'NOT_PRODUCT' and '#ifndef PRODUCT'. (Stefan)
>>>> - Removed volatile keyword at 
>>>> ConcurrentGCTimer::_is_concurrent_phase_active. (Stefan)
>>>> - Removed volatile keyword at 
>>>> ConcurrentMark::_concurrent_phase_started. This should be removed 
>>>> as well.
>>>>
>>>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.02
>>>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.02_to_01/
>>>
>>> This looks good to me.
>>>
>>> Thanks,
>>> Bengt
>>>
>>>>
>>>> Thanks,
>>>> Sangheon
>>>>
>>>>
>>>> On 12/15/2015 11:37 AM, sangheon wrote:
>>>>> Hi Stefan,
>>>>>
>>>>> On 12/15/2015 11:32 AM, Stefan Karlsson wrote:
>>>>>> Hi Sangheon,
>>>>>>
>>>>>> On 2015-12-15 20:29, sangheon wrote:
>>>>>>> Hi Stefan,
>>>>>>>
>>>>>>> Thank you for looking at official RFR as well.
>>>>>>>
>>>>>>> On 12/15/2015 12:58 AM, Stefan Karlsson wrote:
>>>>>>>> Hi Sangheon,
>>>>>>>>
>>>>>>>> On 2015-12-15 07:27, sangheon wrote:
>>>>>>>>> Hi Bengt and all,
>>>>>>>>>
>>>>>>>>> Here's a next webrev.
>>>>>>>>> Bengt and I had a discussion on other thread and he and Stefan 
>>>>>>>>> Karlsson suggested simpler design.
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.01
>>>>>>>>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.01_to_00
>>>>>>>>
>>>>>>>> Thanks for doing the simplifications of the patch.
>>>>>>>>
>>>>>>>> I have a couple of comments:
>>>>>>>> Could you replace NOT_PRODUCT with DEBUG_ONLY, and #ifndef 
>>>>>>>> PRODUCT with #ifdef ASSERT. Otherwise, you'll get 
>>>>>>>> inconsistencies when building with optimized builds where 
>>>>>>>> neither PRODUCT nor ASSERT is defined.
>>>>>>> Okay, I will do.
>>>>>>> But you also pointed that whether this flag is really needed below.
>>>>>>>
>>>>>>>>
>>>>>>>> 81 void ConcurrentGCTimer::register_gc_concurrent_start(const 
>>>>>>>> char* name, const Ticks& time) {
>>>>>>>> 82 assert(!_is_concurrent_phase_active, "A concurrent phase is 
>>>>>>>> already active.");
>>>>>>>> 83 _time_partitions.report_gc_phase_start(name, time, 
>>>>>>>> GCPhase::ConcurrentPhaseType);
>>>>>>>> 84 NOT_PRODUCT(_is_concurrent_phase_active = true;)
>>>>>>>> 85 }
>>>>>>>>
>>>>>>>>  162 class ConcurrentGCTimer : public GCTimer {
>>>>>>>> 163 // ConcurrentGCTimer can't be used if there is an overlap 
>>>>>>>> between a pause phase and a concurrent phase.
>>>>>>>> 164 // _is_concurrent_phase_active is used to find above case.
>>>>>>>> 165 NOT_PRODUCT(volatile bool _is_concurrent_phase_active;)
>>>>>>>> 166
>>>>>>>>  167  public:
>>>>>>>>  168 #ifndef PRODUCT
>>>>>>>> 169 ConcurrentGCTimer(): GCTimer(), 
>>>>>>>> _is_concurrent_phase_active(false) {};
>>>>>>>> 170 #endif Is it really worth optimizing away the boolean? The 
>>>>>>>> code would be cleaner without these macros.
>>>>>>> I just wanted to prevent mis-use of concurrent timer for 
>>>>>>> overlapping phase.
>>>>>>> A pause phase nor another concurrent phase can't be started if a 
>>>>>>> concurrent phase is already started.
>>>>>>> I think this would be nice to have. But I don't have a strong 
>>>>>>> opinion on this as currently we don't have these situations.
>>>>>>
>>>>>> I meant that we could keep the the variable in the product builds 
>>>>>> as well. Just to get rid of the noisy PRODUCT / DEBUG_ONLY 
>>>>>> macros. It could even be useful for future debugging. ;)
>>>>> Oops!
>>>>> I am okay with removing this macro.
>>>>>
>>>>> Let me post a new webrev soon.
>>>>> Thank you for the real time answer!
>>>>>
>>>>> Sangheon
>>>>>
>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Why are you using volatile here? I don't see why that should be 
>>>>>>>> needed.
>>>>>>>>
>>>>>>>>  162 class ConcurrentGCTimer : public GCTimer {
>>>>>>>>  163 // ConcurrentGCTimer can't be used if there is an overlap 
>>>>>>>> between a pause phase and a concurrent phase.
>>>>>>>> 164 // _is_concurrent_phase_active is used to find above case.
>>>>>>>> 165 NOT_PRODUCT(volatile bool _is_concurrent_phase_active;)
>>>>>>> You are right, volatile is not needed here.
>>>>>>> I was thinking register_gc_concurrent_end() would be called from 
>>>>>>> different threads(ConcurrentMarkThread and VMThread), but as it 
>>>>>>> only happens from safepoint, I think volatile should be removed.
>>>>>>>
>>>>>>> I will post a new webrev with your opinion about the flag.
>>>>>>
>>>>>> Great.
>>>>>>
>>>>>> Thanks,
>>>>>> StefanK
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sangheon
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> StefanK
>>>>>>>>
>>>>>>>>>
>>>>>>>>> This webrev includes:
>>>>>>>>> 1. Moving concurrent phase measurement into CMSPhaseAccounting 
>>>>>>>>> class. So we can check all concurrent phases from CMS.
>>>>>>>>>
>>>>>>>>> 2. Renamed a new flag introduced from webrev.00 for general 
>>>>>>>>> use, _concurrent_phase_started.
>>>>>>>>>
>>>>>>>>> 3. Added helper class for concurrent phase measurement, 
>>>>>>>>> GCConcPhaseTimer. And used this class for all concurrent phases.
>>>>>>>>>
>>>>>>>>> 4. Flag checking is moved from 
>>>>>>>>> G1CollectedHeap::register_concurrent_cycle_end() to 
>>>>>>>>> ConcurrentMark::abort().
>>>>>>>>>
>>>>>>>>> 5. Changed switch() to if else statement at gcTraceSend.
>>>>>>>>>
>>>>>>>>> Many thanks to Bengt and Stefan Karlsson for the discussion.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sangheon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/03/2015 12:03 AM, Bengt Rutisson wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Sangheon,
>>>>>>>>>>
>>>>>>>>>> This all sounds good. Looking forward to the next webrev.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Bengt
>>>>>>>>>>
>>>>>>>>>> On 2015-12-02 19:08, sangheon.kim wrote:
>>>>>>>>>>> Hi Bengt,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 12/02/2015 01:00 AM, Bengt Rutisson wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Sangheon,
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry again for the late reply. Comments inline.
>>>>>>>>>>> No not at all.
>>>>>>>>>>> Thank you for taking time for this review!
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2015-11-24 19:48, sangheon.kim wrote:
>>>>>>>>>>>>> Hi Bengt,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you so much for reviewing this patch!
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 11/24/2015 05:12 AM, Bengt Rutisson wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Sangheon,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sorry for the very late reply to this review request.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2015-10-22 01:40, sangheon.kim wrote:
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can I get some reviews for this change of adding a trace 
>>>>>>>>>>>>>>> event for concurrent phases on CMS and G1?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Currently we only measure pause times for major phases.
>>>>>>>>>>>>>>> But I want to add 'concurrent mark' and 'concurrent 
>>>>>>>>>>>>>>> sweep' phases for CMS and 'concurrent mark' phase for G1.
>>>>>>>>>>>>>>> To achieve this, I had to change ConcurrentGCTimer and 
>>>>>>>>>>>>>>> related classes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This patch includes:
>>>>>>>>>>>>>>> 1) A concurrent phase can be measured only from 
>>>>>>>>>>>>>>> ConcurrentGCTimer and measuring an overlap between 
>>>>>>>>>>>>>>> concurrent and pause is not allowed as currently we 
>>>>>>>>>>>>>>> don't have that use case. And TimePartitions class(or 
>>>>>>>>>>>>>>> related classes) will be simpler than an overlap allowed 
>>>>>>>>>>>>>>> case.
>>>>>>>>>>>>>>> 2) I removed PausePhase and ConcurrentPhase which are 
>>>>>>>>>>>>>>> derived from GCPhase because I wanted to avoid heap 
>>>>>>>>>>>>>>> allocation when adding to GrowableArray. Instead 
>>>>>>>>>>>>>>> introduced 'type' member variable at GCPhase.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8068394
>>>>>>>>>>>>>>> Webrev: 
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~sangheki/8068394/webrev.00/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Nice work! It is great to get some timing information for 
>>>>>>>>>>>>>> the concurrent phases.
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A few questions/comments:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> CMS.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You have added timing events for the concurrent phases 
>>>>>>>>>>>>>> mark and sweep, but not for other concurrent phases 
>>>>>>>>>>>>>> (preclean, abortable_preclean and reset_concurrent). I 
>>>>>>>>>>>>>> think that if you moved your calls to 
>>>>>>>>>>>>>> _gc_timer_cm->register_gc_concurrent_start() and 
>>>>>>>>>>>>>> _gc_timer_cm->register_gc_concurrent_end() into the 
>>>>>>>>>>>>>> constructor and destructor of CMSPhaseAccounting you 
>>>>>>>>>>>>>> would automatically get timing events for all concurrent 
>>>>>>>>>>>>>> phases.
>>>>>>>>>>>>> I also considered about this idea but I was not clear 
>>>>>>>>>>>>> whether it is good measurement especially for 'sweep'.
>>>>>>>>>>>>> We are using CMSPhaseAccounting for 5 cases.
>>>>>>>>>>>>> I think for 'mark' and 'preclean' they are okay. (there 
>>>>>>>>>>>>> are non-product functions call before CMSPhaseAccounting. 
>>>>>>>>>>>>> But they are okay).
>>>>>>>>>>>>> For 'abortable-preclean' and 'reset', they are good to 
>>>>>>>>>>>>> have that.
>>>>>>>>>>>>> But when it comes to 'sweep', there are timer and 
>>>>>>>>>>>>> CMSTokenSyncWithLocks related codes and I was not sure 
>>>>>>>>>>>>> about these.
>>>>>>>>>>>>> If you think they are also okay, I will change as you 
>>>>>>>>>>>>> suggested.
>>>>>>>>>>>>
>>>>>>>>>>>> I think it is fine for the sweep part as well. If the 
>>>>>>>>>>>> CMSPhaseAccounting instance is not measuring the correct 
>>>>>>>>>>>> part of the work I think we should change it to measure 
>>>>>>>>>>>> what we want. That way we improve the existing measurements 
>>>>>>>>>>>> too.
>>>>>>>>>>>>
>>>>>>>>>>>> But as you stated in your follow up email, I think the 
>>>>>>>>>>>> sweep code is actually pretty much ok from this perspective.
>>>>>>>>>>>>
>>>>>>>>>>>> So, I would prefer to go with the CMSPhaseAccounting approach.
>>>>>>>>>>> Okay, I already changed as you suggested. :)
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> G1.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think the use of _concurrent_marking_from_roots is 
>>>>>>>>>>>>>> unfortunate. It would be much cleaner if 
>>>>>>>>>>>>>> ConcurrentMark::register_mark_from_roots_phase_end() 
>>>>>>>>>>>>>> called register_gc_concurrent_end() directly. 
>>>>>>>>>>>>> I wanted to avoid introducing the new flag but the reason 
>>>>>>>>>>>>> of it was, as you know, to handle 'abort'.
>>>>>>>>>>>>> When 'abort' happens we do need to end all timers before 
>>>>>>>>>>>>> calling 'GCTracer::report_gc_end()'.
>>>>>>>>>>>>> And at this point we need a way to know whether concurrent 
>>>>>>>>>>>>> timer is started or not.
>>>>>>>>>>>>
>>>>>>>>>>>> As I tried to describe I think you can extend the scope of 
>>>>>>>>>>>> the timing to cover the part until the marking thread 
>>>>>>>>>>>> discovers that the marking was aborted. Basically moving 
>>>>>>>>>>>> the timing in to ConcurrentMarkThread::run_service().
>>>>>>>>>>> Oh, I mis-understood your point. Sorry.
>>>>>>>>>>> I will try your recommendation.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I realize that this would change the timing for when a 
>>>>>>>>>>>>>> concurrent marking is aborted. The whole time for a full 
>>>>>>>>>>>>>> GC (or even several full GCs) would be included in the 
>>>>>>>>>>>>>> concurrent marking phase. But from a code perspective 
>>>>>>>>>>>>>> that is what happens, so maybe that is the correct time 
>>>>>>>>>>>>>> to report? Also, I think the logging is reported that way 
>>>>>>>>>>>>>> so if we want to make it easy to match the timing events 
>>>>>>>>>>>>>> with the logging we might want to use about the same 
>>>>>>>>>>>>>> scope for timing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why do we only measure the concurrent mark phase for G1? 
>>>>>>>>>>>>> This is good question. :)
>>>>>>>>>>>>> I was comparing CMS and G1 for major concurrent phases.
>>>>>>>>>>>>> And the candidates were 'concurrent mark/sweep (CMS)' and 
>>>>>>>>>>>>> 'concurrent mark(G1)'.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Wouldn't it be good to measure all concurrent phases?
>>>>>>>>>>>>> Okay, Stefan Karlsson also pointed to have them as well so 
>>>>>>>>>>>>> I filed a separate CR for this.
>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143951
>>>>>>>>>>>>
>>>>>>>>>>>> Ok. Is there a reason why we want to split this up into two 
>>>>>>>>>>>> changes? Why not add all the concurrent timing events at once?
>>>>>>>>>>>>
>>>>>>>>>>>> I'm ok with splitting it up into two changes, but then 
>>>>>>>>>>>> maybe the title for JDK-8068394 should be changed to 
>>>>>>>>>>>> indicate that it only adds events for the marking phase. 
>>>>>>>>>>>> Also, if we go with the CMSPhaseAccounting approach for 
>>>>>>>>>>>> CMS, you get events for all phases there. So, it will be a 
>>>>>>>>>>>> bit odd to have all concurrent phases handled for CMS but 
>>>>>>>>>>>> only concurrent mark for G1.
>>>>>>>>>>> You are correct and I agree with you.
>>>>>>>>>>> The only reason of split was due to limited time for FC.
>>>>>>>>>>> But as we have enough time, let me close JDK-8143951 and 
>>>>>>>>>>> include concurrent mark for G1 as well.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gcTraceSend.cpp
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  389   void visit(GCPhase* phase) {
>>>>>>>>>>>>>>  390     switch (phase->type()) {
>>>>>>>>>>>>>>  391       case GCPhase::PausePhaseType:
>>>>>>>>>>>>>>  392         assert(phase->level() < 
>>>>>>>>>>>>>> PhasesStack::PHASE_LEVELS, "Need more event types for 
>>>>>>>>>>>>>> PausePhase");
>>>>>>>>>>>>>>  393
>>>>>>>>>>>>>>  394         switch (phase->level()) {
>>>>>>>>>>>>>>  395           case 0: 
>>>>>>>>>>>>>> send_phase<EventGCPhasePause>(phase); break;
>>>>>>>>>>>>>>  396           case 1: 
>>>>>>>>>>>>>> send_phase<EventGCPhasePauseLevel1>(phase); break;
>>>>>>>>>>>>>>  397           case 2: 
>>>>>>>>>>>>>> send_phase<EventGCPhasePauseLevel2>(phase); break;
>>>>>>>>>>>>>>  398           case 3: 
>>>>>>>>>>>>>> send_phase<EventGCPhasePauseLevel3>(phase); break;
>>>>>>>>>>>>>>  399           default: /* Ignore sending this phase */ 
>>>>>>>>>>>>>> break;
>>>>>>>>>>>>>>  400         }
>>>>>>>>>>>>>>  401         break;
>>>>>>>>>>>>>>  402
>>>>>>>>>>>>>>  403       case GCPhase::ConcurrentPhaseType:
>>>>>>>>>>>>>>  404         assert(phase->level() < 1, "There's only one 
>>>>>>>>>>>>>> level for ConcurrentPhase");
>>>>>>>>>>>>>>  405
>>>>>>>>>>>>>>  406         switch (phase->level()) {
>>>>>>>>>>>>>>  407           case 0: 
>>>>>>>>>>>>>> send_phase<EventGCPhaseConcurrent>(phase); break;
>>>>>>>>>>>>>>  408           default: /* Ignore sending this phase */ 
>>>>>>>>>>>>>> break;
>>>>>>>>>>>>>>  409         }
>>>>>>>>>>>>>>  410         break;
>>>>>>>>>>>>>>  411     }
>>>>>>>>>>>>>>  412   }
>>>>>>>>>>>>>>  413 };
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since there are only two values for GCPhase::PhaseType it 
>>>>>>>>>>>>>> seems a bit odd to use a switch statement. I think I 
>>>>>>>>>>>>>> would prefer to factor out the code for the different 
>>>>>>>>>>>>>> types a bit too. So, maybe something like:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> void visit(GCPhase* phase) {
>>>>>>>>>>>>>>   if (phase->type() == GCPhase::PausePhaseType) {
>>>>>>>>>>>>>>     visit_pause(phase);
>>>>>>>>>>>>>>   } else {
>>>>>>>>>>>>>>     assert(phase->type() == GCPhase::ConcurrentPhaseType, 
>>>>>>>>>>>>>> "");
>>>>>>>>>>>>>>     visit_concurrent(phase);
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>> This seems better. I will fix.
>>>>>>>>>>>>
>>>>>>>>>>>> Good. Thanks!
>>>>>>>>>>> :)
>>>>>>>>>>>
>>>>>>>>>>> I will post next webrev when I'm ready, including 
>>>>>>>>>>> JDK-8143951 part as well.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Sangheon
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Bengt
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Sangheon
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Bengt
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Testing: JPRT
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Sangheon
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>




More information about the hotspot-gc-dev mailing list