Request for review (S): 7152954 G1: Native memory leak during full GCs
Bengt Rutisson
bengt.rutisson at oracle.com
Tue Mar 13 20:31:21 UTC 2012
John,
Thanks for the review!
Bengt
On 2012-03-13 17:50, John Cuthbertson wrote:
> Hi Bengt,
>
> Looks good to me. Thanks for making the changes. Thanks also for the
> explanation - very informative. NMT does sound like it will be useful.
>
> JohnC
>
> On 03/13/12 01:49, Bengt Rutisson wrote:
>>
>> Hi John,
>>
>> Thanks for the review!
>>
>> Here is an updated webrev:
>> http://cr.openjdk.java.net/~brutisso/7152954/webrev.01/
>>
>> Some comments inline...
>>
>> On 2012-03-13 00:03, John Cuthbertson wrote:
>>> Hi Bengt,
>>>
>>> Looks good - but I do have a couple of comments:
>>>
>>> survRateGroup.cpp:
>>>
>>> This is not related to your changes - but I just noticed it:
>>>
>>> 108 _surv_rate_pred = NEW_C_HEAP_ARRAY(TruncatedSeq*, _region_num);
>>> 109 if (_surv_rate == NULL) {
>>> 110 vm_exit_out_of_memory(sizeof(TruncatedSeq*) * _region_num,
>>> 111 "Not enough space for surv rate pred array.");
>>> 112 }
>>>
>>> should be:
>>>
>>> 108 _surv_rate_pred = NEW_C_HEAP_ARRAY(TruncatedSeq*, _region_num);
>>> 109 if (*_surv_rate_pred* == NULL) {
>>> 110 vm_exit_out_of_memory(sizeof(TruncatedSeq*) * _region_num,
>>> 111 "Not enough space for surv rate pred array.");
>>> 112 }
>>>
>>
>> You are correct, but NEW_C_HEAP_ARRAY will call
>> vm_exit_out_of_memory() if the allocation fails. So this is dead
>> code. I removed it instead of updating it.
>>
>>> My picture of SurvRateGroup::stop_adding_regions() has always been
>>> that it's just extending the stats arrays (i.e., a realloc
>>> operation). So, with that picture in mind, the following lines:
>>>
>>> 117 for (size_t i = _stats_arrays_length; i< _prev_stats_arrays_length; ++i) {
>>> 118 delete old_surv_rate_pred[i];
>>> 119 }
>>>
>>> really look like they belong in SurvRateGroup::reset().
>>
>> Good point. I was struggling a bit with how this code was supposed to
>> be used. I moved the delete calls into reset(). Thanks for pointing
>> this out.
>>
>>> Excellent catch - what brought it to your attention?
>>
>> It is kind of a funny story. Here is probably more information than
>> you want, so just stop reading if you find it too long ;-)
>>
>> I just wanted to confirm that a small test I found would eventually
>> throw an OOME. I ran it on JDK7 b147 over a weekend. When I checked
>> the test after the weekend it had gotten a native out of memory
>> instead. The test would mostly allocate humongous objects so I
>> figured we were leaking in this respect somehow.
>>
>> I instrumented a build from the tip of hotspot-gc. But to my surprise
>> the leak was gone. That's when I realized that in b147 we were
>> constantly doing full GCs to handle all the humongous objects. But in
>> the latest builds we have the fix I made to initiate concurrent
>> marking on humongous object allocations, so we never did any full
>> GCs. By inducing full GCs with System.gc() calls I could show that
>> the memory leak was in fact still present in the latest builds as well.
>>
>> I tried out the prototype that Zhengyu has for native memory
>> tracking. Unfortunately I didn't get it to work properly on my 64 bit
>> system. After I had already found the cause of the leak, Zhengyu ran
>> my reproducer on a 32 bit system and got this report:
>>
>> [0x54dd14fc] TruncatedSeq::TruncatedSeq+0x4c
>> (malloc=162KB 143KB, #2071 1833)
>>
>> [0x54f43737] SurvRateGroup::reset+0x47
>> (malloc=128KB 115KB, #2049 1833)
>>
>> That is really spot on (assuming SurvRateGroup::stop_adding_regions()
>> got inlined into SurvRateGroup::reset()). So, it looks to me as
>> though NMT will be a really useful tool to find this type of issues
>> in the future.
>>
>> Once I had the small reproducer that just calls System.gc() in a loop
>> I could manually do a binary search of the full GC code to find out
>> where we were leaking.
>>
>> Long story short: I found this by accident and NMT will be a really
>> good tool.
>>
>> Bengt
>>
>>
>>
>>>
>>> JohnC
>>>
>>>
>>> On 03/12/12 02:36, Bengt Rutisson wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Could I have a couple of reviews for this fairly small change:
>>>>
>>>> http://cr.openjdk.java.net/~brutisso/7152954/webrev.00/
>>>>
>>>> The CR will be publicly available soon here:
>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7152954
>>>>
>>>>
>>>> I choose to do a fairly small fix. To me it looks like the
>>>> SurvRateGroup class could benefit from some refactoring. Maybe this
>>>> hole issue could go away with some simplifications to the code. But
>>>> I'd like to keep the change as small as possible for now just in
>>>> case we would like to push this into hs23.
>>>>
>>>>
>>>> Background from the CR:
>>>>
>>>> There is a memory leak in the full GC code for G1. This can be seen
>>>> by running this simple reproducer:
>>>>
>>>> public class SysGC {
>>>> public static void main(String[] args) {
>>>> while (true) {
>>>> System.gc();
>>>> }
>>>> }
>>>> }
>>>>
>>>> I run it with this command line:
>>>>
>>>> java -XX:+UseG1GC -Xms16m -Xmx16m -XX:+PrintGC SysGC
>>>>
>>>> Watching the memory footprint for the java process shows that it is
>>>> constantly using more memory.
>>>>
>>>> The leak comes from SurvRateGroup::stop_adding_regions() which is
>>>> called from SurvRateGroup::reset(), which in turn is called from
>>>> G1CollectorPolicy::record_full_collection_end().
>>>>
>>>> The problem with SurvRateGroup::stop_adding_regions() is that it does:
>>>>
>>>> _surv_rate_pred[i] = new TruncatedSeq(10);
>>>>
>>>> in a loop every time it is called. But there is no corresponding
>>>> call to delete.
>>>>
>>>> Adding a loop to call delete on the previously allocated
>>>> TruncatedSeq objects is not enough to solve the problem since
>>>> TruncatedSeq is itself allocating an array without freeing it.
>>>> Adding a destructor to TruncatedSeq that frees the allocated array
>>>> solves the issue.
>>>>
>>>> With these two fixes the memory leak seems to go away.
>>>>
>>>> Thanks,
>>>> Bengt
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120313/0b6cc738/attachment.htm>
More information about the hotspot-gc-dev
mailing list