Request for review (S): 7152954 G1: Native memory leak during full GCs

Bengt Rutisson bengt.rutisson at oracle.com
Tue Mar 13 22:17:10 UTC 2012


Thanks John and Tony for looking at this.

I'll go ahead and push this.

Bengt

On 2012-03-13 22:20, Tony Printezis wrote:
> Bengt,
>
> Thanks for doing the extra cleanup! The new version looks fine.
>
> Tony
>
> On 03/13/2012 04:30 PM, Bengt Rutisson wrote:
>>
>> Tony,
>>
>> Thanks for the review! And thanks for being picky :-)
>>
>> Updated webrev based on your comments:
>> http://cr.openjdk.java.net/~brutisso/7152954/webrev.02/
>>
>> Bengt
>>
>>
>> On 2012-03-13 17:10, Tony Printezis wrote:
>>> Bengt,
>>>
>>> First, very good catch! And I agree with you that a) SurvRateGroup 
>>> does need some tidying up and b) you're taking the right approach in 
>>> keeping this fix simple at this time.
>>>
>>> The fix looks great, some minor comments below:
>>>
>>>    41     _surv_rate_pred(NULL),
>>>    42     _stats_arrays_length(0)
>>>    43 {
>>>    44   reset();
>>>    45   if (summary_surv_rates_len>  0) {
>>>    46     size_t length = summary_surv_rates_len;
>>>    47       _summary_surv_rates = NEW_C_HEAP_ARRAY(NumberSeq*, length);
>>>    48     if (_summary_surv_rates == NULL) {
>>>    49       vm_exit_out_of_memory(sizeof(NumberSeq*) * length,
>>>    50                             "Not enough space for surv rate summary");
>>>    51     }
>>>    52     for (size_t i = 0; i<  length; ++i)
>>>    53       _summary_surv_rates[i] = new NumberSeq();
>>>    54   }
>>>   
>>> Since you removed the NULL check in the other places where 
>>> NEW_C_HEAP_ARRAY() is called, any chance of doing that here too to 
>>> be consistent throughout the file? Also, (and this is really picky!) 
>>> could you move the curly bracket on line 43 to the line above (you 
>>> did in the reset() method) and put curly brackets around the body of 
>>> the for-loop? Thanks!
>>>
>>>    92 #if 0
>>>    93   gclog_or_tty->print_cr("[%s] stop adding regions, length %d", _name, _region_num);
>>>    94 #endif // 0
>>> Since you removed this, any chance of maybe removing the other 
>>> instances of #if 0 gclog_or_tty->print_cr(...); #endif // 0?
>>>
>>> On 03/13/2012 04:49 AM, Bengt Rutisson wrote:
>>>>
>>>> Hi John,
>>>>
>>>> Thanks for the review!
>>>>
>>>> Here is an updated webrev:
>>>> http://cr.openjdk.java.net/~brutisso/7152954/webrev.01/
>>>>
>>>> Some comments inline...
>>>>
>>>> On 2012-03-13 00:03, John Cuthbertson wrote:
>>>>> Hi Bengt,
>>>>>
>>>>> Looks good  - but I do have a couple of comments:
>>>>>
>>>>> survRateGroup.cpp:
>>>>>
>>>>> This is not related to your changes - but I just noticed it:
>>>>>
>>>>>   108     _surv_rate_pred = NEW_C_HEAP_ARRAY(TruncatedSeq*, _region_num);
>>>>>   109     if (_surv_rate == NULL) {
>>>>>   110       vm_exit_out_of_memory(sizeof(TruncatedSeq*) * _region_num,
>>>>>   111                             "Not enough space for surv rate pred array.");
>>>>>   112     }
>>>>> should be:
>>>>>
>>>>>   108     _surv_rate_pred = NEW_C_HEAP_ARRAY(TruncatedSeq*, _region_num);
>>>>>   109     if (*_surv_rate_pred*  == NULL) {
>>>>>   110       vm_exit_out_of_memory(sizeof(TruncatedSeq*) * _region_num,
>>>>>   111                             "Not enough space for surv rate pred array.");
>>>>>   112     }
>>>>
>>>> You are correct, but NEW_C_HEAP_ARRAY will call 
>>>> vm_exit_out_of_memory() if the allocation fails. So this is dead 
>>>> code. I removed it instead of updating it.
>>>>
>>>>> My picture of SurvRateGroup::stop_adding_regions() has always been 
>>>>> that it's just extending the stats arrays (i.e., a realloc 
>>>>> operation). So, with that picture in mind, the following lines:
>>>>>
>>>>>   117     for (size_t i = _stats_arrays_length; i<  _prev_stats_arrays_length; ++i) {
>>>>>   118       delete old_surv_rate_pred[i];
>>>>>   119     }
>>>>> really look like they belong in SurvRateGroup::reset().
>>>>
>>>> Good point. I was struggling a bit with how this code was supposed 
>>>> to be used. I moved the delete calls into reset(). Thanks for 
>>>> pointing this out.
>>>>
>>>>> Excellent catch - what brought it to your attention?
>>>>
>>>> It is kind of a funny story. Here is probably more information than 
>>>> you want, so just stop reading if you find it too long ;-)
>>>>
>>>> I just wanted to confirm that a small test I found would eventually 
>>>> throw an OOME. I ran it on JDK7 b147 over a weekend. When I checked 
>>>> the test after the weekend it had gotten a native out of memory 
>>>> instead. The test would mostly allocate humongous objects so I 
>>>> figured we were leaking in this respect somehow.
>>>>
>>>> I instrumented a build from the tip of hotspot-gc. But to my 
>>>> surprise the leak was gone. That's when I realized that in b147 we 
>>>> were constantly doing full GCs to handle all the humongous objects. 
>>>> But in the latest builds we have the fix I made to initiate 
>>>> concurrent marking on humongous object allocations, so we never did 
>>>> any full GCs. By inducing full GCs with System.gc() calls I could 
>>>> show that the memory leak was in fact still present in the latest 
>>>> builds as well.
>>>>
>>>> I tried out the prototype that Zhengyu has for native memory 
>>>> tracking. Unfortunately I didn't get it to work properly on my 64 
>>>> bit system. After I had already found the cause of the leak, 
>>>> Zhengyu ran my reproducer on a 32 bit system and got this report:
>>>>
>>>> [0x54dd14fc] TruncatedSeq::TruncatedSeq+0x4c
>>>>                             (malloc=162KB 143KB, #2071 1833)
>>>>
>>>> [0x54f43737] SurvRateGroup::reset+0x47
>>>>                             (malloc=128KB 115KB, #2049 1833)
>>>>
>>>> That is really spot on (assuming 
>>>> SurvRateGroup::stop_adding_regions() got inlined into 
>>>> SurvRateGroup::reset()). So, it looks to me as though NMT will be a 
>>>> really useful tool to find this type of issues in the future.
>>>>
>>>> Once I had the small reproducer that just calls System.gc() in a 
>>>> loop I could manually do a binary search of the full GC code to 
>>>> find out where we were leaking.
>>>>
>>>> Long story short: I found this by accident and NMT will be a really 
>>>> good tool.
>>>>
>>>> Bengt
>>>>
>>>>
>>>>
>>>>>
>>>>> JohnC
>>>>>
>>>>>
>>>>> On 03/12/12 02:36, Bengt Rutisson wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Could I have a couple of reviews for this fairly small change:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~brutisso/7152954/webrev.00/
>>>>>>
>>>>>> The CR will be publicly available soon here:
>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7152954
>>>>>>
>>>>>>
>>>>>> I choose to do a fairly small fix. To me it looks like the 
>>>>>> SurvRateGroup class could benefit from some refactoring. Maybe 
>>>>>> this hole issue could go away with some simplifications to the 
>>>>>> code. But I'd like to keep the change as small as possible for 
>>>>>> now just in case we would like to push this into hs23.
>>>>>>
>>>>>>
>>>>>> Background from the CR:
>>>>>>
>>>>>> There is a memory leak in the full GC code for G1. This can be 
>>>>>> seen by running this simple reproducer:
>>>>>>
>>>>>> public class SysGC {
>>>>>>     public static void main(String[] args) {
>>>>>>         while (true) {
>>>>>>             System.gc();
>>>>>>         }
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>> I run it with this command line:
>>>>>>
>>>>>> java -XX:+UseG1GC -Xms16m -Xmx16m -XX:+PrintGC SysGC
>>>>>>
>>>>>> Watching the memory footprint for the java process shows that it 
>>>>>> is constantly using more memory.
>>>>>>
>>>>>> The leak comes from SurvRateGroup::stop_adding_regions() which is 
>>>>>> called from SurvRateGroup::reset(), which in turn is called from 
>>>>>> G1CollectorPolicy::record_full_collection_end().
>>>>>>
>>>>>> The problem with SurvRateGroup::stop_adding_regions() is that it 
>>>>>> does:
>>>>>>
>>>>>> _surv_rate_pred[i] = new TruncatedSeq(10);
>>>>>>
>>>>>> in a loop every time it is called. But there is no corresponding 
>>>>>> call to delete.
>>>>>>
>>>>>> Adding a loop to call delete on the previously allocated 
>>>>>> TruncatedSeq objects is not enough to solve the problem since 
>>>>>> TruncatedSeq is itself allocating an array without freeing it. 
>>>>>> Adding a destructor to TruncatedSeq that frees the allocated 
>>>>>> array solves the issue.
>>>>>>
>>>>>> With these two fixes the memory leak seems to go away.
>>>>>>
>>>>>> Thanks,
>>>>>> Bengt
>>>>>
>>>>
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120313/158d6b96/attachment.htm>


More information about the hotspot-gc-dev mailing list