Request for review 7141200: log some interesting information in ring buffers for crashes

Tom Rodriguez tom.rodriguez at oracle.com
Wed Feb 1 07:16:45 PST 2012


On Feb 1, 2012, at 12:50 AM, Bengt Rutisson wrote:

> 
> Tom,
> 
> Thanks for changing this! It looks much better. The code is cleaner and the the information in the hs_err file is more useful.
> 
> I have only looked through the GC code changes. I think it looks good.
> 
> Some comments:
> 
> The information provided by print_heap_after_gc() is very good, but it does not clearly communicate what type of GC it was. In particular for G1 I think it would be interesting to know if it was a young GC, mixed GC, initial mark of full GC. I'm ok with not including this information now, but maybe we can file a bug to include it? Or maybe we should just leave it for when we re-work this based on JFR…

I'd like to not rework that at this point.  I'll file a bug for some enhancements.

> 
> G1 has the notion of "extended" heap information. This can be turned on with the flag PrintHeapAtGCExtended. This gives a lot of information for each region in the heap. I tried running with your patch and turning this flag on. Fortunately you limit the length of the events so the hs_err file is not as huge as it could be. But it gets quite difficult to parse the hs_err file since the extended information kind of messes the output up. I'm attaching an example hs_err file with this output.
> 
> Again, I'm ok with leaving this as it is and just filing a bug that we should try to turn off "extended" when we log the information for the event buffer. I don't expect that PrintHeapAtGCExtended will be widely used by customers.

Ok.  That sounds good.

> 
> Code comments:
> 
> psMarkSweep.cpp, psParallelCompact.cpp, psScavenge.cpp and collectedHeap.hpp
> 
> If you change the calls "CollectedHeap::print_heap_before_gc()" to "heap->print_heap_before_gc()" the print_heap_before_gc() method does not have to be static. That also means that the _gc_heap_log does not have to be static. Instead we could always set it up in the CollectedHeap::CollectedHeap() constructor as.
> 
>  if (LogEvents) {
>    _gc_heap_log = new GCHeapLog();
>  } else {
>    _gc_heap_log = NULL;
>  }
> 
> No need for collectedHeap_log_init() and initialize_heap_log() anymore.

I had originally gone with static because it seemed simpler, but you're right it would be easy to make it non-static.

> 
> 
> 
> collectedHeap.cpp
> 
> void GCHeapLog::log_heap(bool before) {
>  if (!should_log()) return;
> 
> The GC code tries to standardize on always using {} after if. So, it would be nice if this was:
> 
> void GCHeapLog::log_heap(bool before) {
>  if (!should_log()) {
>    return;
>  }

Sure.  I tend to do that too but occasionally I skip it.

> 
> 
> Overall this looks great. Thanks again for fixing all of this!

I'm glad we were able to resolve your concerns.  I think it turned out pretty well.

tom

> 
> Bengt
> 
> 
> On 2012-01-31 23:27, Tom Rodriguez wrote:
>> I've got a new version of the GC logging code.  It's now using PrintHeapAtGC routines which will give more useful and better formatted output.  I fixed all the review comments, deleted the EventMarks in GC code since no one objected, renamed _count to _length and _filled to _count in EventLogBase.  Each output line is preceded by Event: to help with greppability.  I've updated the webrev and the examples.
>> 
>> Thanks for all the good feedback.
>> 
>> Here's a sample of the new GC output:
>> 
>> GC Heap History (10):
>> Event: 9672681252 GC heap before
>> {Heap before GC invocations=25 (full 12):
>>  PSYoungGen      total 19712K, used 338K [0xe1e00000, 0xe3400000, 0xf7400000)
>>   eden space 16896K, 2% used [0xe1e00000,0xe1e54870,0xe2e80000)
>>   from space 2816K, 0% used [0xe3140000,0xe3140000,0xe3400000)
>>   to   space 2816K, 0% used [0xe2e80000,0xe2e80000,0xe3140000)
>>  PSOldGen        total 43008K, used 366K [0xb7400000, 0xb9e00000, 0xe1e00000)
>>   object space 43008K, 0% used [0xb7400000,0xb745bbf8,0xb9e00000)
>>  PSPermGen       total 16384K, used 2398K [0xb3400000, 0xb4400000, 0xb7400000)
>>   object space 16384K, 14% used [0xb3400000,0xb36578e0,0xb4400000)
>> Event: 9672681255 GC heap after
>> Heap after GC invocations=25 (full 12):
>>  PSYoungGen      total 19712K, used 48K [0xe1e00000, 0xe3400000, 0xf7400000)
>>   eden space 16896K, 0% used [0xe1e00000,0xe1e00000,0xe2e80000)
>>   from space 2816K, 1% used [0xe2e80000,0xe2e8c000,0xe3140000)
>>   to   space 2816K, 0% used [0xe3140000,0xe3140000,0xe3400000)
>>  PSOldGen        total 43008K, used 366K [0xb7400000, 0xb9e00000, 0xe1e00000)
>>   object space 43008K, 0% used [0xb7400000,0xb745bbf8,0xb9e00000)
>>  PSPermGen       total 16384K, used 2398K [0xb3400000, 0xb4400000, 0xb7400000)
>>   object space 16384K, 14% used [0xb3400000,0xb36578e0,0xb4400000)
>> }
>> 
>> A full example is at http://cr.openjdk.java.net/~never/7141200/example1.log.
>> 
>> tom
>> 
>> On Jan 31, 2012, at 10:08 AM, Kevin Walls wrote:
>> 
>>> Hi Tom,
>>> 
>>> Thanks for the examples.
>>> 
>>> Really like that we have a timestamp so we can see the ordering of separate types
>>> of events.  But we will get questions on how to convert a big number like
>>> 9617182121 into a time, and how it relates to times we see in other log files:
>>> can we convert it, e.g the same format as the GC logs?
>>> 
>>> (worst case, I was thinking we log the value of first_hrtime so we have some baseline,
>>> but I'm sure we can make it prettier than that 8-) )
>>> 
>>> 
>>> Why are the GC events all happening in the time period: Time: 0.000000.3 - 0.000000.3
>>> 
>>> That must be the format string %f.3 in memoryManager.cpp.  Again maybe we
>>> can make this more readable, this figure will no doubt be used to search GC logs.
>>> 
>>> 
>>> Trivial typos:
>>> events.hpp line 108, maybe "Should only be called while mutex is held."
>>> events.hpp line 58 subclassed rather than subclasses.
>>> 
>>> 
>>> Thanks!
>>> Kevin
>>> 
>>> 
>>> On 31/01/12 07:50, Tom Rodriguez wrote:
>>>> http://cr.openjdk.java.net/~never/7141200
>>>> 664 lines changed: 389 ins; 185 del; 90 mod; 37188 unchg
>>>> 
>>>> 7141200: log some interesting information in ring buffers for crashes
>>>> Reviewed-by:
>>>> 
>>>> As part of convergence with JRockit it was requested that Hotspot
>>>> capture some internal events in ring buffer so that they could be
>>>> dumped into the error log to provide more context at crash time.  I
>>>> chose to make these logs be mostly text based for simplicity of
>>>> implementation.  Eventually more of the flight recorder infrastructure
>>>> will be in place and the recording of this kind of data should be more
>>>> integrated the infrastructure for that.  So my current implementation
>>>> is somewhat ad hoc since I expect it will be revisited.
>>>> 
>>>> This also doesn't include one of the more important features, dumping
>>>> crashing context when scanning the heap.  Capturing that information
>>>> without impacting GC performance is difficult and would require broad
>>>> changes in the heap iteration code.  That can possibly be address later.
>>>> 
>>>> I've chosen to tear out the existing Events class which was a debug
>>>> only feature of limited use and extend that into a collection of
>>>> product ring buffers.  They are registered during JVM startup and
>>>> dumped into the hs_err log when the JVM crashes.  There are a couple
>>>> different categories maintained: compilation, deopt, internal
>>>> exceptions, other uncategorized messages and GC events.  Each ring
>>>> buffer has it's own mutex to reduce contention across different kinds
>>>> of events and generally logging is performed for relatively infrequent
>>>> or expensive events to minimize any performance penalty.  I also chose
>>>> to maintain separate logs for the categories so that higher frequency
>>>> events one kind don't hide other kinds.
>>>> 
>>>> I eliminated some of the less useful Events::log calls as well and
>>>> move some others around.
>>>> 
>>>> The GC logging keeps track of GC history using GCStatInfos.  This has
>>>> the benefit that it's GC independent and provides before and after
>>>> information.  It could also or instead capture Universe::print_on
>>>> though that's pretty verbose output.  The GC section is fairly long
>>>> and I'm not sure what the best output would be.  Opinions would be
>>>> appreciated.
>>>> 
>>>> There are some EventMarks in GC code that I'm tempted to delete.  Any
>>>> objections?
>>>> 
>>>> I put a couple example logs at:
>>>> 
>>>> http://cr.openjdk.java.net/~never/7141200/example1.log
>>>> http://cr.openjdk.java.net/~never/7141200/example2.log
>>>> 
>>>> I can produce others if needed.
>>>> 
>>>> Performance seems to be largely unchanged according to refworkload but
>>>> I'm still running some benchmarks.  Tested with JCK vm and lang, and
>>>> the nsk tests.
>>>> 
>>>> 
> 
> <hs_err_pid4492.log>



More information about the hotspot-dev mailing list