Request for review (S): 7152954 G1: Native memory leak during full GCs
Tony Printezis
tony.printezis at oracle.com
Tue Mar 13 21:20:21 UTC 2012
Bengt,
Thanks for doing the extra cleanup! The new version looks fine.
Tony
On 03/13/2012 04:30 PM, Bengt Rutisson wrote:
>
> Tony,
>
> Thanks for the review! And thanks for being picky :-)
>
> Updated webrev based on your comments:
> http://cr.openjdk.java.net/~brutisso/7152954/webrev.02/
>
> Bengt
>
>
> On 2012-03-13 17:10, Tony Printezis wrote:
>> Bengt,
>>
>> First, very good catch! And I agree with you that a) SurvRateGroup
>> does need some tidying up and b) you're taking the right approach in
>> keeping this fix simple at this time.
>>
>> The fix looks great, some minor comments below:
>>
>> 41 _surv_rate_pred(NULL),
>> 42 _stats_arrays_length(0)
>> 43 {
>> 44 reset();
>> 45 if (summary_surv_rates_len> 0) {
>> 46 size_t length = summary_surv_rates_len;
>> 47 _summary_surv_rates = NEW_C_HEAP_ARRAY(NumberSeq*, length);
>> 48 if (_summary_surv_rates == NULL) {
>> 49 vm_exit_out_of_memory(sizeof(NumberSeq*) * length,
>> 50 "Not enough space for surv rate summary");
>> 51 }
>> 52 for (size_t i = 0; i< length; ++i)
>> 53 _summary_surv_rates[i] = new NumberSeq();
>> 54 }
>>
>> Since you removed the NULL check in the other places where
>> NEW_C_HEAP_ARRAY() is called, any chance of doing that here too to be
>> consistent throughout the file? Also, (and this is really picky!)
>> could you move the curly bracket on line 43 to the line above (you
>> did in the reset() method) and put curly brackets around the body of
>> the for-loop? Thanks!
>>
>> 92 #if 0
>> 93 gclog_or_tty->print_cr("[%s] stop adding regions, length %d", _name, _region_num);
>> 94 #endif // 0
>> Since you removed this, any chance of maybe removing the other
>> instances of #if 0 gclog_or_tty->print_cr(...); #endif // 0?
>>
>> On 03/13/2012 04:49 AM, Bengt Rutisson wrote:
>>>
>>> Hi John,
>>>
>>> Thanks for the review!
>>>
>>> Here is an updated webrev:
>>> http://cr.openjdk.java.net/~brutisso/7152954/webrev.01/
>>>
>>> Some comments inline...
>>>
>>> On 2012-03-13 00:03, John Cuthbertson wrote:
>>>> Hi Bengt,
>>>>
>>>> Looks good - but I do have a couple of comments:
>>>>
>>>> survRateGroup.cpp:
>>>>
>>>> This is not related to your changes - but I just noticed it:
>>>>
>>>> 108 _surv_rate_pred = NEW_C_HEAP_ARRAY(TruncatedSeq*, _region_num);
>>>> 109 if (_surv_rate == NULL) {
>>>> 110 vm_exit_out_of_memory(sizeof(TruncatedSeq*) * _region_num,
>>>> 111 "Not enough space for surv rate pred array.");
>>>> 112 }
>>>> should be:
>>>>
>>>> 108 _surv_rate_pred = NEW_C_HEAP_ARRAY(TruncatedSeq*, _region_num);
>>>> 109 if (*_surv_rate_pred* == NULL) {
>>>> 110 vm_exit_out_of_memory(sizeof(TruncatedSeq*) * _region_num,
>>>> 111 "Not enough space for surv rate pred array.");
>>>> 112 }
>>>
>>> You are correct, but NEW_C_HEAP_ARRAY will call
>>> vm_exit_out_of_memory() if the allocation fails. So this is dead
>>> code. I removed it instead of updating it.
>>>
>>>> My picture of SurvRateGroup::stop_adding_regions() has always been
>>>> that it's just extending the stats arrays (i.e., a realloc
>>>> operation). So, with that picture in mind, the following lines:
>>>>
>>>> 117 for (size_t i = _stats_arrays_length; i< _prev_stats_arrays_length; ++i) {
>>>> 118 delete old_surv_rate_pred[i];
>>>> 119 }
>>>> really look like they belong in SurvRateGroup::reset().
>>>
>>> Good point. I was struggling a bit with how this code was supposed
>>> to be used. I moved the delete calls into reset(). Thanks for
>>> pointing this out.
>>>
>>>> Excellent catch - what brought it to your attention?
>>>
>>> It is kind of a funny story. Here is probably more information than
>>> you want, so just stop reading if you find it too long ;-)
>>>
>>> I just wanted to confirm that a small test I found would eventually
>>> throw an OOME. I ran it on JDK7 b147 over a weekend. When I checked
>>> the test after the weekend it had gotten a native out of memory
>>> instead. The test would mostly allocate humongous objects so I
>>> figured we were leaking in this respect somehow.
>>>
>>> I instrumented a build from the tip of hotspot-gc. But to my
>>> surprise the leak was gone. That's when I realized that in b147 we
>>> were constantly doing full GCs to handle all the humongous objects.
>>> But in the latest builds we have the fix I made to initiate
>>> concurrent marking on humongous object allocations, so we never did
>>> any full GCs. By inducing full GCs with System.gc() calls I could
>>> show that the memory leak was in fact still present in the latest
>>> builds as well.
>>>
>>> I tried out the prototype that Zhengyu has for native memory
>>> tracking. Unfortunately I didn't get it to work properly on my 64
>>> bit system. After I had already found the cause of the leak, Zhengyu
>>> ran my reproducer on a 32 bit system and got this report:
>>>
>>> [0x54dd14fc] TruncatedSeq::TruncatedSeq+0x4c
>>> (malloc=162KB 143KB, #2071 1833)
>>>
>>> [0x54f43737] SurvRateGroup::reset+0x47
>>> (malloc=128KB 115KB, #2049 1833)
>>>
>>> That is really spot on (assuming
>>> SurvRateGroup::stop_adding_regions() got inlined into
>>> SurvRateGroup::reset()). So, it looks to me as though NMT will be a
>>> really useful tool to find this type of issues in the future.
>>>
>>> Once I had the small reproducer that just calls System.gc() in a
>>> loop I could manually do a binary search of the full GC code to find
>>> out where we were leaking.
>>>
>>> Long story short: I found this by accident and NMT will be a really
>>> good tool.
>>>
>>> Bengt
>>>
>>>
>>>
>>>>
>>>> JohnC
>>>>
>>>>
>>>> On 03/12/12 02:36, Bengt Rutisson wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Could I have a couple of reviews for this fairly small change:
>>>>>
>>>>> http://cr.openjdk.java.net/~brutisso/7152954/webrev.00/
>>>>>
>>>>> The CR will be publicly available soon here:
>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7152954
>>>>>
>>>>>
>>>>> I choose to do a fairly small fix. To me it looks like the
>>>>> SurvRateGroup class could benefit from some refactoring. Maybe
>>>>> this hole issue could go away with some simplifications to the
>>>>> code. But I'd like to keep the change as small as possible for now
>>>>> just in case we would like to push this into hs23.
>>>>>
>>>>>
>>>>> Background from the CR:
>>>>>
>>>>> There is a memory leak in the full GC code for G1. This can be
>>>>> seen by running this simple reproducer:
>>>>>
>>>>> public class SysGC {
>>>>> public static void main(String[] args) {
>>>>> while (true) {
>>>>> System.gc();
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> I run it with this command line:
>>>>>
>>>>> java -XX:+UseG1GC -Xms16m -Xmx16m -XX:+PrintGC SysGC
>>>>>
>>>>> Watching the memory footprint for the java process shows that it
>>>>> is constantly using more memory.
>>>>>
>>>>> The leak comes from SurvRateGroup::stop_adding_regions() which is
>>>>> called from SurvRateGroup::reset(), which in turn is called from
>>>>> G1CollectorPolicy::record_full_collection_end().
>>>>>
>>>>> The problem with SurvRateGroup::stop_adding_regions() is that it
>>>>> does:
>>>>>
>>>>> _surv_rate_pred[i] = new TruncatedSeq(10);
>>>>>
>>>>> in a loop every time it is called. But there is no corresponding
>>>>> call to delete.
>>>>>
>>>>> Adding a loop to call delete on the previously allocated
>>>>> TruncatedSeq objects is not enough to solve the problem since
>>>>> TruncatedSeq is itself allocating an array without freeing it.
>>>>> Adding a destructor to TruncatedSeq that frees the allocated array
>>>>> solves the issue.
>>>>>
>>>>> With these two fixes the memory leak seems to go away.
>>>>>
>>>>> Thanks,
>>>>> Bengt
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120313/cd18eb6a/attachment.htm>
More information about the hotspot-gc-dev
mailing list