Request for review (S): 7152954 G1: Native memory leak during full GCs
Bengt Rutisson
bengt.rutisson at oracle.com
Tue Mar 13 08:49:20 UTC 2012
Hi John,
Thanks for the review!
Here is an updated webrev:
http://cr.openjdk.java.net/~brutisso/7152954/webrev.01/
Some comments inline...
On 2012-03-13 00:03, John Cuthbertson wrote:
> Hi Bengt,
>
> Looks good - but I do have a couple of comments:
>
> survRateGroup.cpp:
>
> This is not related to your changes - but I just noticed it:
>
> 108 _surv_rate_pred = NEW_C_HEAP_ARRAY(TruncatedSeq*, _region_num);
> 109 if (_surv_rate == NULL) {
> 110 vm_exit_out_of_memory(sizeof(TruncatedSeq*) * _region_num,
> 111 "Not enough space for surv rate pred array.");
> 112 }
> should be:
>
> 108 _surv_rate_pred = NEW_C_HEAP_ARRAY(TruncatedSeq*, _region_num);
> 109 if (*_surv_rate_pred* == NULL) {
> 110 vm_exit_out_of_memory(sizeof(TruncatedSeq*) * _region_num,
> 111 "Not enough space for surv rate pred array.");
> 112 }
You are correct, but NEW_C_HEAP_ARRAY will call vm_exit_out_of_memory()
if the allocation fails. So this is dead code. I removed it instead of
updating it.
> My picture of SurvRateGroup::stop_adding_regions() has always been
> that it's just extending the stats arrays (i.e., a realloc operation).
> So, with that picture in mind, the following lines:
>
> 117 for (size_t i = _stats_arrays_length; i< _prev_stats_arrays_length; ++i) {
> 118 delete old_surv_rate_pred[i];
> 119 }
> really look like they belong in SurvRateGroup::reset().
Good point. I was struggling a bit with how this code was supposed to be
used. I moved the delete calls into reset(). Thanks for pointing this out.
> Excellent catch - what brought it to your attention?
It is kind of a funny story. Here is probably more information than you
want, so just stop reading if you find it too long ;-)
I just wanted to confirm that a small test I found would eventually
throw an OOME. I ran it on JDK7 b147 over a weekend. When I checked the
test after the weekend it had gotten a native out of memory instead. The
test would mostly allocate humongous objects so I figured we were
leaking in this respect somehow.
I instrumented a build from the tip of hotspot-gc. But to my surprise
the leak was gone. That's when I realized that in b147 we were
constantly doing full GCs to handle all the humongous objects. But in
the latest builds we have the fix I made to initiate concurrent marking
on humongous object allocations, so we never did any full GCs. By
inducing full GCs with System.gc() calls I could show that the memory
leak was in fact still present in the latest builds as well.
I tried out the prototype that Zhengyu has for native memory tracking.
Unfortunately I didn't get it to work properly on my 64 bit system.
After I had already found the cause of the leak, Zhengyu ran my
reproducer on a 32 bit system and got this report:
[0x54dd14fc] TruncatedSeq::TruncatedSeq+0x4c
(malloc=162KB 143KB, #2071 1833)
[0x54f43737] SurvRateGroup::reset+0x47
(malloc=128KB 115KB, #2049 1833)
That is really spot on (assuming SurvRateGroup::stop_adding_regions()
got inlined into SurvRateGroup::reset()). So, it looks to me as though
NMT will be a really useful tool to find this type of issues in the future.
Once I had the small reproducer that just calls System.gc() in a loop I
could manually do a binary search of the full GC code to find out where
we were leaking.
Long story short: I found this by accident and NMT will be a really good
tool.
Bengt
>
> JohnC
>
>
> On 03/12/12 02:36, Bengt Rutisson wrote:
>>
>> Hi all,
>>
>> Could I have a couple of reviews for this fairly small change:
>>
>> http://cr.openjdk.java.net/~brutisso/7152954/webrev.00/
>>
>> The CR will be publicly available soon here:
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7152954
>>
>>
>> I choose to do a fairly small fix. To me it looks like the
>> SurvRateGroup class could benefit from some refactoring. Maybe this
>> hole issue could go away with some simplifications to the code. But
>> I'd like to keep the change as small as possible for now just in case
>> we would like to push this into hs23.
>>
>>
>> Background from the CR:
>>
>> There is a memory leak in the full GC code for G1. This can be seen
>> by running this simple reproducer:
>>
>> public class SysGC {
>> public static void main(String[] args) {
>> while (true) {
>> System.gc();
>> }
>> }
>> }
>>
>> I run it with this command line:
>>
>> java -XX:+UseG1GC -Xms16m -Xmx16m -XX:+PrintGC SysGC
>>
>> Watching the memory footprint for the java process shows that it is
>> constantly using more memory.
>>
>> The leak comes from SurvRateGroup::stop_adding_regions() which is
>> called from SurvRateGroup::reset(), which in turn is called from
>> G1CollectorPolicy::record_full_collection_end().
>>
>> The problem with SurvRateGroup::stop_adding_regions() is that it does:
>>
>> _surv_rate_pred[i] = new TruncatedSeq(10);
>>
>> in a loop every time it is called. But there is no corresponding call
>> to delete.
>>
>> Adding a loop to call delete on the previously allocated TruncatedSeq
>> objects is not enough to solve the problem since TruncatedSeq is
>> itself allocating an array without freeing it. Adding a destructor to
>> TruncatedSeq that frees the allocated array solves the issue.
>>
>> With these two fixes the memory leak seems to go away.
>>
>> Thanks,
>> Bengt
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120313/817b1dab/attachment.htm>
More information about the hotspot-gc-dev
mailing list