Request for review (S): 7152954 G1: Native memory leak during full GCs

John Cuthbertson john.cuthbertson at oracle.com
Tue Mar 13 16:50:13 UTC 2012


Hi Bengt,

Looks good to me. Thanks for making the changes. Thanks also for the 
explanation - very informative. NMT does sound like it will be useful.

JohnC

On 03/13/12 01:49, Bengt Rutisson wrote:
>
> Hi John,
>
> Thanks for the review!
>
> Here is an updated webrev:
> http://cr.openjdk.java.net/~brutisso/7152954/webrev.01/
>
> Some comments inline...
>
> On 2012-03-13 00:03, John Cuthbertson wrote:
>> Hi Bengt,
>>
>> Looks good  - but I do have a couple of comments:
>>
>> survRateGroup.cpp:
>>
>> This is not related to your changes - but I just noticed it:
>>
>>  108     _surv_rate_pred = NEW_C_HEAP_ARRAY(TruncatedSeq*, _region_num);
>>  109     if (_surv_rate == NULL) {
>>  110       vm_exit_out_of_memory(sizeof(TruncatedSeq*) * _region_num,
>>  111                             "Not enough space for surv rate pred array.");
>>  112     }
>>     
>> should be:
>>
>>  108     _surv_rate_pred = NEW_C_HEAP_ARRAY(TruncatedSeq*, _region_num);
>>  109     if (*_surv_rate_pred* == NULL) {
>>  110       vm_exit_out_of_memory(sizeof(TruncatedSeq*) * _region_num,
>>  111                             "Not enough space for surv rate pred array.");
>>  112     }
>>     
>
> You are correct, but NEW_C_HEAP_ARRAY will call 
> vm_exit_out_of_memory() if the allocation fails. So this is dead code. 
> I removed it instead of updating it.
>
>> My picture of SurvRateGroup::stop_adding_regions() has always been 
>> that it's just extending the stats arrays (i.e., a realloc 
>> operation). So, with that picture in mind, the following lines:
>>
>>  117     for (size_t i = _stats_arrays_length; i < _prev_stats_arrays_length; ++i) {
>>  118       delete old_surv_rate_pred[i];
>>  119     }
>>     
>> really look like they belong in SurvRateGroup::reset().
>
> Good point. I was struggling a bit with how this code was supposed to 
> be used. I moved the delete calls into reset(). Thanks for pointing 
> this out.
>
>> Excellent catch - what brought it to your attention?
>
> It is kind of a funny story. Here is probably more information than 
> you want, so just stop reading if you find it too long ;-)
>
> I just wanted to confirm that a small test I found would eventually 
> throw an OOME. I ran it on JDK7 b147 over a weekend. When I checked 
> the test after the weekend it had gotten a native out of memory 
> instead. The test would mostly allocate humongous objects so I figured 
> we were leaking in this respect somehow.
>
> I instrumented a build from the tip of hotspot-gc. But to my surprise 
> the leak was gone. That's when I realized that in b147 we were 
> constantly doing full GCs to handle all the humongous objects. But in 
> the latest builds we have the fix I made to initiate concurrent 
> marking on humongous object allocations, so we never did any full GCs. 
> By inducing full GCs with System.gc() calls I could show that the 
> memory leak was in fact still present in the latest builds as well.
>
> I tried out the prototype that Zhengyu has for native memory tracking. 
> Unfortunately I didn't get it to work properly on my 64 bit system. 
> After I had already found the cause of the leak, Zhengyu ran my 
> reproducer on a 32 bit system and got this report:
>
> [0x54dd14fc] TruncatedSeq::TruncatedSeq+0x4c
>                             (malloc=162KB 143KB, #2071 1833)
>
> [0x54f43737] SurvRateGroup::reset+0x47
>                             (malloc=128KB 115KB, #2049 1833)
>
> That is really spot on (assuming SurvRateGroup::stop_adding_regions() 
> got inlined into SurvRateGroup::reset()). So, it looks to me as though 
> NMT will be a really useful tool to find this type of issues in the 
> future.
>
> Once I had the small reproducer that just calls System.gc() in a loop 
> I could manually do a binary search of the full GC code to find out 
> where we were leaking.
>
> Long story short: I found this by accident and NMT will be a really 
> good tool.
>
> Bengt
>
>
>
>>
>> JohnC
>>
>>
>> On 03/12/12 02:36, Bengt Rutisson wrote:
>>>
>>> Hi all,
>>>
>>> Could I have a couple of reviews for this fairly small change:
>>>
>>> http://cr.openjdk.java.net/~brutisso/7152954/webrev.00/
>>>
>>> The CR will be publicly available soon here:
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7152954
>>>
>>>
>>> I choose to do a fairly small fix. To me it looks like the 
>>> SurvRateGroup class could benefit from some refactoring. Maybe this 
>>> hole issue could go away with some simplifications to the code. But 
>>> I'd like to keep the change as small as possible for now just in 
>>> case we would like to push this into hs23.
>>>
>>>
>>> Background from the CR:
>>>
>>> There is a memory leak in the full GC code for G1. This can be seen 
>>> by running this simple reproducer:
>>>
>>> public class SysGC {
>>>     public static void main(String[] args) {
>>>         while (true) {
>>>             System.gc();
>>>         }
>>>     }
>>> }
>>>
>>> I run it with this command line:
>>>
>>> java -XX:+UseG1GC -Xms16m -Xmx16m -XX:+PrintGC SysGC
>>>
>>> Watching the memory footprint for the java process shows that it is 
>>> constantly using more memory.
>>>
>>> The leak comes from SurvRateGroup::stop_adding_regions() which is 
>>> called from SurvRateGroup::reset(), which in turn is called from 
>>> G1CollectorPolicy::record_full_collection_end().
>>>
>>> The problem with SurvRateGroup::stop_adding_regions() is that it does:
>>>
>>> _surv_rate_pred[i] = new TruncatedSeq(10);
>>>
>>> in a loop every time it is called. But there is no corresponding 
>>> call to delete.
>>>
>>> Adding a loop to call delete on the previously allocated 
>>> TruncatedSeq objects is not enough to solve the problem since 
>>> TruncatedSeq is itself allocating an array without freeing it. 
>>> Adding a destructor to TruncatedSeq that frees the allocated array 
>>> solves the issue.
>>>
>>> With these two fixes the memory leak seems to go away.
>>>
>>> Thanks,
>>> Bengt
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120313/95f92158/attachment.htm>


More information about the hotspot-gc-dev mailing list