RFR (M): 8013895 G1: G1SummarizeRSetStats output on Linux needs improvement

John Cuthbertson john.cuthbertson at oracle.com
Fri May 24 23:32:27 UTC 2013


Hi Thomas,

The changes look good. A couple of nits (feel free to ignore):

concurrentG1Refine.cpp:
>   138 ConcurrentG1RefineThread * ConcurrentG1Refine::sampling_thread() const {
>   139   return _threads[_n_threads - 1];

Can we use _n_worker_threads to get the pointer to the RS sampling 
thread? If you look at the initialization code we have
if (_worker_id >= _n_worker_threads) {
...
}

g1RemSetSummary.[ch]pp
* Don't put a space before '*' in declarations.

g1RemSet.[ch]pp:
* Instead of _last_period_summary - can we have 
_prev[ious]_period_summary? 'Last' is too overridden.

JohnC

On 5/24/2013 12:47 AM, Thomas Schatzl wrote:
> Hi,
>
>    could any Reviewer for this change have a look - I have a review from
> Jesper?
>
> Thanks a lot,
>    Thomas
>
> On Wed, 2013-05-15 at 19:23 +0200, Thomas Schatzl wrote:
>> Hi all,
>>
>>    I'd like to add some further bug fix to this CR, as it only concerns a
>> single line.
>>
>> In the original code to retrieve remembered set statistics of the
>> regions, when iterating over the heap, the code excluded regions that
>> continued humonguous regions.
>>
>> This results in totals, as while the continued humonguous regions do not
>> have remembered set entries, they have a separate remembered set
>> instance. So when skipping them, the size of the empty remembered sets
>> are missing in the totals. (Note that an even an empty remembered set
>> needs a few k of heap).
>>
>> The update webrev at
>> http://cr.openjdk.java.net/~tschatzl/8013895/webrev.1
>>
>> fixes that. In HRRSStatsIter::doHeapRegion() it removes the first line
>> of the implementation that skips continued humonguous regions (i.e. the
>> line "if (r->continuesHumongous()) return false;").
>>
>> I verified by testing that the remembered set instances of continued
>> humonguous regions are different than the remembered set instance of the
>> humonguous starting region - which would have explained this code.
>>
>> I reran JPRT and the jtreg tests successfully.
>>
>> The rest of the information about this change still applies:
>>
>> On Mon, 2013-05-13 at 23:25 +0200, Thomas Schatzl wrote:
>>> Hi all,
>>>
>>>    I'd like to ask you for reviews for this change.
>>>
>>> It improves the output of G1SummarizeRSetStats on all platforms.
>>>
>>> The previous output of G1SummarizeRSetStats has the following issues as shown in the example output from the CR:
>>>
>>>   Concurrent RS processed -1761455529 cards
>>>    Of 9224262 completed buffers:
>>>        9224262 (100.0%) by conc RS threads.
>>>              0 ( 0.0%) by mutator threads.
>>>    Conc RS threads times(s)
>>>            0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>>>    Total heap region rem set sizes = 1120947K. Max = 4047K.
>>>    Static structures = 600K, free_lists = 5312K.
>>>      717820015 occupied cards represented.
>>>      Max size region = 378:(O)[0x00000004fa000000,0x00000004fb000000,0x00000004fb000000], size = 4048K, occupied = 37303K.
>>>      Did 209966 coarsenings.
>>>
>>> "Concurrent RS processed" uses the wrong data type, overflowing easily.
>>> "Conc RS threads times" are always zero except on Solaris
>>> "Conc RS threads times" reports the young gen sampling thread as remembered set thread.
>>> The data for everything except for the remembered set sizes (beginning
>>> at "Total heap region..." is always cumulative, which is hard to
>>> read/evaluate.
>>>
>>> This change includes the following modifications:
>>> - use a larger data type for the number of concurrent RS processed
>>> - division by zeros when calculating percentages for the completed
>>> buffers when the number of completed buffers is still zero
>>> - previously printed the summary only for young gen gcs, now also
>>> printing the summary at full gcs
>>> - split out the young gen sampling thread from the conc RS threads times
>>> - fixed the missing space between "times" and "(s)" in the same line
>>> - changed slightly how the G1SummarizeRSetStats
>>> G1SummarizeRSetStatsPeriod flags work. If G1SummarizeRSetStats is true,
>>> always print a cumulative summary at VM exit; if
>>> G1SummarizeRSetStatsPeriod is > 0 (default), print a cumulative summary
>>> for the last period GCs; i.e. if G1SummarizeRSetStatsPeriod is 1, print
>>> the summary for the most recent GC.
>>> - fix gathering of thread times:
>>>     - on linux supporting getrusage(RUSAGE_THREAD, ...), use this
>>> information, otherwise fallback to the elapsed time as suggested in the
>>> CR
>>>     - on Windows, use the GetThreadTimes() system API call, fallback to
>>> elapsed time
>>>     - on BSD, use the elapsed time counter as I do not know an API call.
>>> (getrusage() does not seem to support RUSAGE_THREAD there)
>>>
>>> Here is an example output for printing the summary for a different
>>> application on a different machine:
>>>
>>>   Cumulative RS summary
>>>
>>>   Concurrent RS processed 1504863 cards
>>>    Of 7194 completed buffers:
>>>           7194 (100.0%) by concurrent RS threads.
>>>              0 (  0.0%) by mutator threads.
>>>    Concurrent RS threads times (s)
>>>            1.81
>>>    Concurrent sampling threads times (s)
>>>            0.10
>>>    Total heap region rem set sizes = 119K.  Max = 4K.
>>>    Static structures = 7K, free_lists = 209K.
>>>      23253 occupied cards represented.
>>>      Max size region =
>>> 0:(O)[0x00000000c2200000,0x00000000c22fffe0,0x00000000c2300000], size =
>>> 5K, occupied = 5K.
>>>      Did 0 coarsenings.
>>>
>>> - the "Cumulative RS summary" header is only for the final printout at
>>> the end of the GC
>>> - separated out "Concurrent RS threads times" and "Concurrent sampling
>>> threads times"
>>>
>>> The remembered set information is always for the current situation and
>>> never cumulative in any way as before. There does not seem to be a
>>> useful way to make it cumulative...
>>>
>>> Guide to read the patch
>>> - I moved the remembered set summary code to a separate class
>>> G1RemSetSummary, and added new files (g1RemSetSummary.?pp) for it.
>>> - the main change has been about managing the periodicity, as G1 only
>>> ever stores the accumulated data. For this reason, G1RemSet keeps a
>>> local instance of G1RemSetSummary with the values of the last printout.
>>> When it is time to create a new periodic output, a new temporary
>>> G1RemSetSummary is created and filled with the current values from the
>>> various sources, and the last values subtracted from it.
>>> - the various os_* files contain the changes to return thread timing
>>> values
>>> - changes in G1ConcurrentRefine are mostly about some helper functions
>>> to distinguish between actual worker and the RS sampling thread.
>>> - g1CollectedHeap.cpp contains the change that potentially prints the
>>> summary at every gc, not only at young gc
>>> - the test case verifies how many times the remset summary is actually
>>> printed for a few configurations of G1SummarizeRSetStats and
>>> G1SummarizeRSetStatsPeriod.
>>>
>>>
>>> Webrev
>>> http://cr.openjdk.java.net/~tschatzl/8013895/webrev
>>>
>>> Bugs.sun.com
>>> http://bugs.sun.com/view_bug.do?bug_id=8013895
>>>
>>> Jira:
>>> https://jbs.oracle.com/bugs/browse/JDK-8013895
>>>
>>> Testing:
>>> jprt, test case, manual testing of getrusage(RUSAGE_THREAD, ) and
>>> fallback on Linux x64, manual testing of GetThreadTimes() on Windows 64
>>> bit, no special testing on BSD/OS X 10.7 except noticing that
>>> getrusage(RUSAGE_THREAD) is not available there.
>>>
>> Thanks in advance,
>>    Thomas
>>
>




More information about the hotspot-gc-dev mailing list