Second Code Review for WeakReference leak in the Logging API (6942989)
Daniel D. Daugherty
daniel.daugherty at oracle.com
Fri Jul 2 12:55:25 PDT 2010
On 7/2/2010 11:56 AM, Jeremy Manson wrote:
> Hi Dan,
>
> I don't know what AnonLoggerWeakRefTest looks like,
Check out the webrevs in the review requests. I included the
tests in the those reviews.
> but I am fairly
> confident that if you create a few million loggers and then drop down
> to one or two, the backing array of the Hashtable will still be bigger
> than it should be.
>
Possibly. I just wanted to confirm that we weren't leaking and I
think I've confirmed that.
> Still, no real harm done - that's a fairly unusual situation.
>
Agreed.
Dan
> Jeremy
>
> On Tue, Jun 29, 2010 at 9:10 AM, Daniel D. Daugherty
> <daniel.daugherty at oracle.com> wrote:
>
>> Jeremy,
>>
>> Closing the loop on this part of the thread. I don't think there are
>> any more leaks left after the fix is applied. Here are the entries
>> I added the "public comments" section of the bug. For some reason
>> that bug is _still_ not showing up on the OpenJDK site.
>>
>>
>>> === *Public Comments*
>>> ========================================================
>>> I ran the AnonLoggerWeakRefLeak test with the following env variables:
>>>
>>>
>>> TESTJAVA=c:/work/shared/mirrors/jdks-win32/jdk/1.7.0/latest/binaries/windows-i586
>>> TESTCLASSES=C:/cygwin/home/dcubed/Projects/6942989/XXX
>>> TESTVMOPTS=-showversion -client -Xmx32m
>>> TESTSRC=C:/cygwin/home/dcubed/Projects/6942989/XXX
>>>
>>> along with a 30 minute duration (1800 seconds) using JDK7-B99 bits
>>> (which does NOT have the fix).
>>>
>>> Here's the last 15 lines of the log files:
>>>
>>> $ tail -15 AnonLogger.jdk7-B99_30min.log
>>> INFO: instance_cnt = 1032505
>>> INFO: instance_cnt = 1038305
>>> INFO: instance_cnt = 1044005
>>> INFO: instance_cnt = 1049705
>>> INFO: jmap exited with exit code = 1
>>> INFO: The likely reason is that AnonLoggerWeakRefLeak has finished
>>> running.
>>> INFO: increasing_cnt = 181
>>> INFO: decreasing_cnt = 0
>>> INFO: The instance count of java.lang.ref.WeakReference objects
>>> INFO: is always increasing.
>>> FAIL: This indicates that there is a memory leak.
>>>
>>> real 18m10.296s
>>> user 0m41.570s
>>> sys 0m46.442s
>>>
>>> $ tail -15 AnonLoggerWeakRefLeak.jdk7-B99_30min.log
>>> INFO: call count = 1044000
>>> INFO: call count = 1045000
>>> INFO: call count = 1046000
>>> INFO: call count = 1047000
>>> INFO: call count = 1048000
>>> INFO: call count = 1049000
>>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>> at java.util.Arrays.copyOf(Arrays.java:2239)
>>> at java.util.Arrays.copyOf(Arrays.java:2213)
>>> at java.util.ArrayList.grow(ArrayList.java:208)
>>> at java.util.ArrayList.ensureCapacity(ArrayList.java:182)
>>> at java.util.ArrayList.add(ArrayList.java:406)
>>> at java.util.logging.Logger.doSetParent(Logger.java:1401)
>>> at java.util.logging.Logger.getAnonymousLogger(Logger.java:374)
>>> at AnonLoggerWeakRefLeak.main(AnonLoggerWeakRefLeak.java:60)
>>>
>>>
>>> So it took a little over 18 minutes for a Client VM with a 32MB
>>> max heap size to throw an OOME with AnonLoggerWeakRefLeak.
>>>
>>>
>>> I ran the LoggerWeakRefLeak test with the same environment variables
>>> along with a 30 minute duration (1800 seconds) using JDK7-B99 bits
>>> (which does NOT have the fix).
>>>
>>> Here's the last 15 lines of the log files:
>>>
>>> $ tail -15 Logger.jdk7-B99_30min.log
>>> INFO: instance_cnt = 935705
>>> INFO: instance_cnt = 938605
>>> INFO: instance_cnt = 941205
>>> INFO: instance_cnt = 943205
>>> INFO: jmap exited with exit code = 1
>>> INFO: The likely reason is that LoggerWeakRefLeak has finished running.
>>> INFO: increasing_cnt = 160
>>> INFO: decreasing_cnt = 0
>>> INFO: The instance count of java.lang.ref.WeakReference objects
>>> INFO: is always increasing.
>>> FAIL: This indicates that there is a memory leak.
>>>
>>> real 16m47.690s
>>> user 0m37.910s
>>> sys 0m40.002s
>>>
>>> $ tail -15 LoggerWeakRefLeak.jdk7-B99_30min.log
>>> INFO: call count = 881000
>>> INFO: call count = 882000
>>> INFO: call count = 883000
>>> INFO: call count = 884000
>>> INFO: call count = 885000
>>> INFO: call count = 886000
>>> INFO: call count = 887000
>>> INFO: call count = 888000
>>> INFO: call count = 889000
>>> INFO: call count = 890000
>>> INFO: call count = 891000
>>> INFO: call count = 892000
>>> INFO: call count = 893000
>>> INFO: call count = 894000
>>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>
>>>
>>> So it took a little less than 17 minutes for a Client VM with a 32MB
>>> max heap size to throw an OOME with LoggerWeakRefLeak.
>>>
>>> *** (#1 of 3): 2010-06-28 22:45:03 GMT+00:00 daniel.daugherty at oracle.com
>>> *** Last Edit: 2010-06-28 22:57:39 GMT+00:00 daniel.daugherty at oracle.com
>>>
>>> I ran the AnonLoggerWeakRefLeak test with the same environment
>>> variables for a 60 minute duration using the fixed bits from
>>> JPRT. Here are the last 15 lines of each log (much less
>>> interesting info):
>>>
>>> $ tail -15 AnonLogger.jdk7-B100+_60min.log
>>> INFO: instance_cnt = 5801
>>> INFO: instance_cnt = 5901
>>> INFO: instance_cnt = 5701
>>> INFO: instance_cnt = 5601
>>> INFO: jmap exited with exit code = 1
>>> INFO: The likely reason is that AnonLoggerWeakRefLeak has finished
>>> running.
>>> INFO: increasing_cnt = 227
>>> INFO: decreasing_cnt = 383
>>> INFO: The instance count of java.lang.ref.WeakReference objects
>>> INFO: is both increasing and decreasing.
>>> PASS: This indicates that there is not a memory leak.
>>>
>>> real 60m5.099s
>>> user 2m18.882s
>>> sys 2m31.127s
>>>
>>> $ tail -15 AnonLoggerWeakRefLeak.jdk7-B100+_60min.log
>>> INFO: call count = 3484000
>>> INFO: call count = 3485000
>>> INFO: call count = 3486000
>>> INFO: call count = 3487000
>>> INFO: call count = 3488000
>>> INFO: call count = 3489000
>>> INFO: call count = 3490000
>>> INFO: call count = 3491000
>>> INFO: call count = 3492000
>>> INFO: call count = 3493000
>>> INFO: call count = 3494000
>>> INFO: call count = 3495000
>>> INFO: call count = 3496000
>>> INFO: call count = 3497000
>>> INFO: final loop count = 3497200
>>>
>>>
>>> I ran the LoggerWeakRefLeak test with the same environment
>>> variables for a 60 minute duration using the fixed bits from
>>> JPRT. Here are the last 15 lines of each log (much less
>>> interesting info):
>>>
>>> $ tail -15 Logger.jdk7-B100+_60min.log
>>> INFO: instance_cnt = 1813
>>> INFO: instance_cnt = 1638
>>> INFO: instance_cnt = 1514
>>> INFO: instance_cnt = 1511
>>> INFO: jmap exited with exit code = 1
>>> INFO: The likely reason is that LoggerWeakRefLeak has finished running.
>>> INFO: increasing_cnt = 293
>>> INFO: decreasing_cnt = 318
>>> INFO: The instance count of java.lang.ref.WeakReference objects
>>> INFO: is both increasing and decreasing.
>>> PASS: This indicates that there is not a memory leak.
>>>
>>> real 60m6.783s
>>> user 2m20.592s
>>> sys 2m31.997s
>>>
>>> $ tail -15 LoggerWeakRefLeak.jdk7-B100+_60min.log
>>> INFO: call count = 3502000
>>> INFO: call count = 3503000
>>> INFO: call count = 3504000
>>> INFO: call count = 3505000
>>> INFO: call count = 3506000
>>> INFO: call count = 3507000
>>> INFO: call count = 3508000
>>> INFO: call count = 3509000
>>> INFO: call count = 3510000
>>> INFO: call count = 3511000
>>> INFO: call count = 3512000
>>> INFO: call count = 3513000
>>> INFO: call count = 3514000
>>> INFO: call count = 3515000
>>> INFO: final loop count = 3515800
>>>
>>> *** (#2 of 3): 2010-06-29 01:46:49 GMT+00:00 daniel.daugherty at oracle.com
>>>
>>> I ran the AnonLoggerWeakRefLeak test with the same environment
>>> variables for a 12 hour duration using the fixed bits from JPRT.
>>> For this run I saved the "Total" line from the jmap output from
>>> every 10th sample:
>>>
>>> $ diff ../AnonLoggerWeakRefLeak.sh AnonLoggerWeakRefLeak.sh
>>> 161a162
>>>
>>>
>>>>>>> "$TEST_NAME.totals"
>>>>>>>
>>> 225a227,233
>>>
>>>
>>>>>> set +e
>>>>>>
>>>>> mod=`expr "$loop_cnt" % 10`
>>>>> set -e
>>>>> if [ "$mod" = 0 ]; then
>>>>> tail -1 "$TEST_NAME.jmap" >> "$TEST_NAME.totals"
>>>>> fi
>>>>>
>>> Here is an analysis of the .totals data:
>>>
>>> $ sh analyze_totals < AnonLoggerWeakRefLeak.totals
>>> #objs #bytes
>>> first: 30537 2243648
>>> lo: 30537 2243648
>>> hi: 57072 3197904
>>> last: 35676 2882488
>>> avg: 36853 2929982
>>> # samples: 647
>>>
>>> The first sample is also the lowest set of values which isn't
>>> a surprise given that the first sample is taken shortly after
>>> the target program has started running. The hi value occurred
>>> in sample #269 of 647 and the last sample was below the average.
>>> This data indicates that the values are both rising and falling
>>> over time which does not indicate any more memory leaks.
>>>
>>>
>>>
>>> I also ran the LoggerWeakRefLeak test with the same environment
>>> variables for a 12 hour duration using the fixed bits from JPRT.
>>> For this run I saved the "Total" line from the jmap output from
>>> every 10th sample:
>>>
>>> $ diff ../LoggerWeakRefLeak.sh LoggerWeakRefLeak.sh
>>> 161a162
>>>
>>>
>>>>>>> "$TEST_NAME.totals"
>>>>>>>
>>> 225a227,233
>>>
>>>
>>>>>> set +e
>>>>>>
>>>>> mod=`expr "$loop_cnt" % 10`
>>>>> set -e
>>>>> if [ "$mod" = 0 ]; then
>>>>> tail -1 "$TEST_NAME.jmap" >> "$TEST_NAME.totals"
>>>>> fi
>>>>>
>>> Here is an analysis of the .totals data:
>>>
>>> $ sh analyze_totals < LoggerWeakRefLeak.totals
>>> #objs #bytes
>>> first: 48957 2832648
>>> lo: 48957 2832648
>>> hi: 148299 6299016
>>> last: 133173 5451032
>>> avg: 137578 5608654
>>> # samples: 647
>>>
>>> The first sample is also the lowest set of values which isn't
>>> a surprise given that the first sample is taken shortly after
>>> the target program has started running. The hi value occurred
>>> in sample #76 of 647 and the last sample was below the average.
>>> This data indicates that the values are both rising and falling
>>> over time which does not indicate any more memory leaks.
>>>
>>> *** (#3 of 3): 2010-06-29 15:59:39 GMT+00:00 daniel.daugherty at oracle.com
>>>
>> Please let me know if the fix is not working for you in your
>> environment at Google.
>>
>> Dan
>>
>>
>> On 6/23/2010 6:38 PM, Daniel D. Daugherty wrote:
>>
>>> On 6/23/2010 12:38 PM, Jeremy Manson wrote:
>>>
>>>> Hi Daniel,
>>>>
>>>> I'm sorry I missed this (I heavily filter these lists, and check rarely).
>>>>
>>>>
>>> This time I specifically left you on the "To" list rather than
>>> editing down to just the list aliases.
>>>
>>>
>>>
>>>> My main feeling is that you are missing a good bet by not
>>>> reconstructing the Hashtable in LogManager and the ArrayList in Logger
>>>> every so often when you remove the loggers. In a test case where
>>>> there are a LOT of short-lived loggers, the backing array for the
>>>> Hashtable can get very big. It is permanent, and doesn't go anywhere.
>>>> You can end up with a lot of extra memory lying around that way.
>>>>
>>>>
>>> It's possible that is a good bet. However, that wasn't mentioned
>>> as one of the issues in either bug report (I think) so I didn't
>>> write a test for that.
>>>
>>>
>>>
>>>> Specifically, when I didn't reconstruct those data structures, the
>>>> test case listed in the bug (where it just creates lots and lots of
>>>> anonymous loggers) killed the Java instance with an OOM, even if I
>>>> *did* clean up the weakreferences to the loggers.
>>>>
>>>>
>>> I'll create a variant of the anon logger test that I put in the
>>> changeset and checkout if I can kill the Java instance with an OOM.
>>> I'll keep you posted.
>>>
>>>
>>>
>>>> I'm assuming you have a customer waiting for this - if that is similar
>>>> to their usage pattern, this fix may not fix their problem.
>>>>
>>>>
>>> Thanks for the heads up. The JDK6 version of the fix hasn't been tested
>>> by the customer yet so you might be right.
>>>
>>>
>>>
>>>> You obviously don't want to rebuild those structures every time,
>>>> though. What I did in my change was to reconstruct the backing data
>>>> structures every time ~as many loggers were collected as were present
>>>> in the data structure.
>>>>
>>>>
>>> Yup. I caught that part of the rebuild algorithm. It's just that the
>>> reason for doing the rebuild didn't jump out at me.
>>>
>>> Dan
>>>
>>>
>>>
>>>> Jeremy
>>>>
>>>> On Fri, Jun 18, 2010 at 12:25 PM, Daniel D. Daugherty
>>>> <daniel.daugherty at oracle.com> wrote:
>>>>
>>>>
>>>>> Greetings,
>>>>>
>>>>> I have a new version of my fix for the WeakReference leak in the
>>>>> Logging API done. This version uses ReferenceQueues; thanks to Eamonn
>>>>> McManus, Jeremy Manson and Tony Printezis for their insights on using
>>>>> ReferenceQueues. Here's a pointer to Tony's paper for background info:
>>>>>
>>>>> http://java.sun.com/developer/technicalArticles/javase/finalization/
>>>>>
>>>>> This version also has limits on the number of dead Loggers that are
>>>>> cleaned up per call; thanks to Alan Bateman for politely pushing me in
>>>>> that direction.
>>>>>
>>>>> The webrev is again relative to OpenJDK7, but the bug is escalated so
>>>>> the fix will be backported to the JDK6-Update train. So again, I'll
>>>>> need a minimum of two code reviewers.
>>>>>
>>>>> Here is the URL for the webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/6942989-webrev/1/
>>>>>
>>>>> Thanks, in advance, for any reviews.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>>
More information about the serviceability-dev
mailing list