RFR (M): 8013895 G1: G1SummarizeRSetStats output on Linux needs improvement

Thomas Schatzl thomas.schatzl at oracle.com
Fri May 24 07:47:31 UTC 2013


Hi,

  could any Reviewer for this change have a look - I have a review from
Jesper?

Thanks a lot,
  Thomas

On Wed, 2013-05-15 at 19:23 +0200, Thomas Schatzl wrote:
> Hi all,
> 
>   I'd like to add some further bug fix to this CR, as it only concerns a
> single line.
> 
> In the original code to retrieve remembered set statistics of the
> regions, when iterating over the heap, the code excluded regions that
> continued humonguous regions.
> 
> This results in totals, as while the continued humonguous regions do not
> have remembered set entries, they have a separate remembered set
> instance. So when skipping them, the size of the empty remembered sets
> are missing in the totals. (Note that an even an empty remembered set
> needs a few k of heap).
> 
> The update webrev at
> http://cr.openjdk.java.net/~tschatzl/8013895/webrev.1
> 
> fixes that. In HRRSStatsIter::doHeapRegion() it removes the first line
> of the implementation that skips continued humonguous regions (i.e. the
> line "if (r->continuesHumongous()) return false;").
> 
> I verified by testing that the remembered set instances of continued
> humonguous regions are different than the remembered set instance of the
> humonguous starting region - which would have explained this code.
> 
> I reran JPRT and the jtreg tests successfully.
> 
> The rest of the information about this change still applies:
> 
> On Mon, 2013-05-13 at 23:25 +0200, Thomas Schatzl wrote:
> > Hi all,
> > 
> >   I'd like to ask you for reviews for this change.
> > 
> > It improves the output of G1SummarizeRSetStats on all platforms.
> > 
> > The previous output of G1SummarizeRSetStats has the following issues as shown in the example output from the CR:
> > 
> >  Concurrent RS processed -1761455529 cards 
> >   Of 9224262 completed buffers: 
> >       9224262 (100.0%) by conc RS threads. 
> >             0 ( 0.0%) by mutator threads. 
> >   Conc RS threads times(s) 
> >           0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
> >   Total heap region rem set sizes = 1120947K. Max = 4047K. 
> >   Static structures = 600K, free_lists = 5312K. 
> >     717820015 occupied cards represented. 
> >     Max size region = 378:(O)[0x00000004fa000000,0x00000004fb000000,0x00000004fb000000], size = 4048K, occupied = 37303K. 
> >     Did 209966 coarsenings. 
> > 
> > "Concurrent RS processed" uses the wrong data type, overflowing easily.
> > "Conc RS threads times" are always zero except on Solaris
> > "Conc RS threads times" reports the young gen sampling thread as remembered set thread.
> > The data for everything except for the remembered set sizes (beginning
> > at "Total heap region..." is always cumulative, which is hard to
> > read/evaluate.
> > 
> > This change includes the following modifications:
> > - use a larger data type for the number of concurrent RS processed
> > - division by zeros when calculating percentages for the completed
> > buffers when the number of completed buffers is still zero
> > - previously printed the summary only for young gen gcs, now also
> > printing the summary at full gcs
> > - split out the young gen sampling thread from the conc RS threads times
> > - fixed the missing space between "times" and "(s)" in the same line
> > - changed slightly how the G1SummarizeRSetStats
> > G1SummarizeRSetStatsPeriod flags work. If G1SummarizeRSetStats is true,
> > always print a cumulative summary at VM exit; if
> > G1SummarizeRSetStatsPeriod is > 0 (default), print a cumulative summary
> > for the last period GCs; i.e. if G1SummarizeRSetStatsPeriod is 1, print
> > the summary for the most recent GC.
> > - fix gathering of thread times:
> >    - on linux supporting getrusage(RUSAGE_THREAD, ...), use this
> > information, otherwise fallback to the elapsed time as suggested in the
> > CR
> >    - on Windows, use the GetThreadTimes() system API call, fallback to
> > elapsed time
> >    - on BSD, use the elapsed time counter as I do not know an API call.
> > (getrusage() does not seem to support RUSAGE_THREAD there)
> > 
> > Here is an example output for printing the summary for a different
> > application on a different machine:
> > 
> >  Cumulative RS summary
> > 
> >  Concurrent RS processed 1504863 cards
> >   Of 7194 completed buffers:
> >          7194 (100.0%) by concurrent RS threads.
> >             0 (  0.0%) by mutator threads.
> >   Concurrent RS threads times (s)
> >           1.81
> >   Concurrent sampling threads times (s)
> >           0.10
> >   Total heap region rem set sizes = 119K.  Max = 4K.
> >   Static structures = 7K, free_lists = 209K.
> >     23253 occupied cards represented.
> >     Max size region =
> > 0:(O)[0x00000000c2200000,0x00000000c22fffe0,0x00000000c2300000], size =
> > 5K, occupied = 5K.
> >     Did 0 coarsenings.
> > 
> > - the "Cumulative RS summary" header is only for the final printout at
> > the end of the GC
> > - separated out "Concurrent RS threads times" and "Concurrent sampling
> > threads times"
> > 
> > The remembered set information is always for the current situation and
> > never cumulative in any way as before. There does not seem to be a
> > useful way to make it cumulative...
> > 
> > Guide to read the patch
> > - I moved the remembered set summary code to a separate class
> > G1RemSetSummary, and added new files (g1RemSetSummary.?pp) for it.
> > - the main change has been about managing the periodicity, as G1 only
> > ever stores the accumulated data. For this reason, G1RemSet keeps a
> > local instance of G1RemSetSummary with the values of the last printout.
> > When it is time to create a new periodic output, a new temporary
> > G1RemSetSummary is created and filled with the current values from the
> > various sources, and the last values subtracted from it.
> > - the various os_* files contain the changes to return thread timing
> > values
> > - changes in G1ConcurrentRefine are mostly about some helper functions
> > to distinguish between actual worker and the RS sampling thread.
> > - g1CollectedHeap.cpp contains the change that potentially prints the
> > summary at every gc, not only at young gc
> > - the test case verifies how many times the remset summary is actually
> > printed for a few configurations of G1SummarizeRSetStats and
> > G1SummarizeRSetStatsPeriod.
> > 
> > 
> > Webrev
> > http://cr.openjdk.java.net/~tschatzl/8013895/webrev
> > 
> > Bugs.sun.com
> > http://bugs.sun.com/view_bug.do?bug_id=8013895
> > 
> > Jira:
> > https://jbs.oracle.com/bugs/browse/JDK-8013895
> > 
> > Testing:
> > jprt, test case, manual testing of getrusage(RUSAGE_THREAD, ) and
> > fallback on Linux x64, manual testing of GetThreadTimes() on Windows 64
> > bit, no special testing on BSD/OS X 10.7 except noticing that
> > getrusage(RUSAGE_THREAD) is not available there.
> > 
> 
> Thanks in advance,
>   Thomas
> 





More information about the hotspot-gc-dev mailing list