Second Code Review for WeakReference leak in the Logging API (6942989)

Jeremy Manson jeremymanson at google.com
Fri Jul 2 10:56:44 PDT 2010


Hi Dan,

I don't know what AnonLoggerWeakRefTest looks like, but I am fairly
confident that if you create a few million loggers and then drop down
to one or two, the backing array of the Hashtable will still be bigger
than it should be.

Still, no real harm done - that's a fairly unusual situation.

Jeremy

On Tue, Jun 29, 2010 at 9:10 AM, Daniel D. Daugherty
<daniel.daugherty at oracle.com> wrote:
> Jeremy,
>
> Closing the loop on this part of the thread. I don't think there are
> any more leaks left after the fix is applied. Here are the entries
> I added the "public comments" section of the bug. For some reason
> that bug is _still_ not showing up on the OpenJDK site.
>
>> === *Public Comments*
>> ========================================================
>> I ran the AnonLoggerWeakRefLeak test with the following env variables:
>>
>>
>> TESTJAVA=c:/work/shared/mirrors/jdks-win32/jdk/1.7.0/latest/binaries/windows-i586
>> TESTCLASSES=C:/cygwin/home/dcubed/Projects/6942989/XXX
>> TESTVMOPTS=-showversion -client -Xmx32m
>> TESTSRC=C:/cygwin/home/dcubed/Projects/6942989/XXX
>>
>> along with a 30 minute duration (1800 seconds) using JDK7-B99 bits
>> (which does NOT have the fix).
>>
>> Here's the last 15 lines of the log files:
>>
>> $ tail -15 AnonLogger.jdk7-B99_30min.log
>> INFO: instance_cnt = 1032505
>> INFO: instance_cnt = 1038305
>> INFO: instance_cnt = 1044005
>> INFO: instance_cnt = 1049705
>> INFO: jmap exited with exit code = 1
>> INFO: The likely reason is that AnonLoggerWeakRefLeak has finished
>> running.
>> INFO: increasing_cnt = 181
>> INFO: decreasing_cnt = 0
>> INFO: The instance count of java.lang.ref.WeakReference objects
>> INFO: is always increasing.
>> FAIL: This indicates that there is a memory leak.
>>
>> real    18m10.296s
>> user    0m41.570s
>> sys     0m46.442s
>>
>> $ tail -15 AnonLoggerWeakRefLeak.jdk7-B99_30min.log
>> INFO: call count = 1044000
>> INFO: call count = 1045000
>> INFO: call count = 1046000
>> INFO: call count = 1047000
>> INFO: call count = 1048000
>> INFO: call count = 1049000
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>        at java.util.Arrays.copyOf(Arrays.java:2239)
>>        at java.util.Arrays.copyOf(Arrays.java:2213)
>>        at java.util.ArrayList.grow(ArrayList.java:208)
>>        at java.util.ArrayList.ensureCapacity(ArrayList.java:182)
>>        at java.util.ArrayList.add(ArrayList.java:406)
>>        at java.util.logging.Logger.doSetParent(Logger.java:1401)
>>        at java.util.logging.Logger.getAnonymousLogger(Logger.java:374)
>>        at AnonLoggerWeakRefLeak.main(AnonLoggerWeakRefLeak.java:60)
>>
>>
>> So it took a little over 18 minutes for a Client VM with a 32MB
>> max heap size to throw an OOME with AnonLoggerWeakRefLeak.
>>
>>
>> I ran the LoggerWeakRefLeak test with the same environment variables
>> along with a 30 minute duration (1800 seconds) using JDK7-B99 bits
>> (which does NOT have the fix).
>>
>> Here's the last 15 lines of the log files:
>>
>> $ tail -15 Logger.jdk7-B99_30min.log
>> INFO: instance_cnt = 935705
>> INFO: instance_cnt = 938605
>> INFO: instance_cnt = 941205
>> INFO: instance_cnt = 943205
>> INFO: jmap exited with exit code = 1
>> INFO: The likely reason is that LoggerWeakRefLeak has finished running.
>> INFO: increasing_cnt = 160
>> INFO: decreasing_cnt = 0
>> INFO: The instance count of java.lang.ref.WeakReference objects
>> INFO: is always increasing.
>> FAIL: This indicates that there is a memory leak.
>>
>> real    16m47.690s
>> user    0m37.910s
>> sys     0m40.002s
>>
>> $ tail -15 LoggerWeakRefLeak.jdk7-B99_30min.log
>> INFO: call count = 881000
>> INFO: call count = 882000
>> INFO: call count = 883000
>> INFO: call count = 884000
>> INFO: call count = 885000
>> INFO: call count = 886000
>> INFO: call count = 887000
>> INFO: call count = 888000
>> INFO: call count = 889000
>> INFO: call count = 890000
>> INFO: call count = 891000
>> INFO: call count = 892000
>> INFO: call count = 893000
>> INFO: call count = 894000
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>
>>
>> So it took a little less than 17 minutes for a Client VM with a 32MB
>> max heap size to throw an OOME with LoggerWeakRefLeak.
>>
>> *** (#1 of 3): 2010-06-28 22:45:03 GMT+00:00 daniel.daugherty at oracle.com
>> *** Last Edit: 2010-06-28 22:57:39 GMT+00:00 daniel.daugherty at oracle.com
>>
>> I ran the AnonLoggerWeakRefLeak test with the same environment
>> variables for a 60 minute duration using the fixed bits from
>> JPRT. Here are the last 15 lines of each log (much less
>> interesting info):
>>
>> $ tail -15 AnonLogger.jdk7-B100+_60min.log
>> INFO: instance_cnt = 5801
>> INFO: instance_cnt = 5901
>> INFO: instance_cnt = 5701
>> INFO: instance_cnt = 5601
>> INFO: jmap exited with exit code = 1
>> INFO: The likely reason is that AnonLoggerWeakRefLeak has finished
>> running.
>> INFO: increasing_cnt = 227
>> INFO: decreasing_cnt = 383
>> INFO: The instance count of java.lang.ref.WeakReference objects
>> INFO: is both increasing and decreasing.
>> PASS: This indicates that there is not a memory leak.
>>
>> real    60m5.099s
>> user    2m18.882s
>> sys     2m31.127s
>>
>> $ tail -15 AnonLoggerWeakRefLeak.jdk7-B100+_60min.log
>> INFO: call count = 3484000
>> INFO: call count = 3485000
>> INFO: call count = 3486000
>> INFO: call count = 3487000
>> INFO: call count = 3488000
>> INFO: call count = 3489000
>> INFO: call count = 3490000
>> INFO: call count = 3491000
>> INFO: call count = 3492000
>> INFO: call count = 3493000
>> INFO: call count = 3494000
>> INFO: call count = 3495000
>> INFO: call count = 3496000
>> INFO: call count = 3497000
>> INFO: final loop count = 3497200
>>
>>
>> I ran the LoggerWeakRefLeak test with the same environment
>> variables for a 60 minute duration using the fixed bits from
>> JPRT. Here are the last 15 lines of each log (much less
>> interesting info):
>>
>> $ tail -15 Logger.jdk7-B100+_60min.log
>> INFO: instance_cnt = 1813
>> INFO: instance_cnt = 1638
>> INFO: instance_cnt = 1514
>> INFO: instance_cnt = 1511
>> INFO: jmap exited with exit code = 1
>> INFO: The likely reason is that LoggerWeakRefLeak has finished running.
>> INFO: increasing_cnt = 293
>> INFO: decreasing_cnt = 318
>> INFO: The instance count of java.lang.ref.WeakReference objects
>> INFO: is both increasing and decreasing.
>> PASS: This indicates that there is not a memory leak.
>>
>> real    60m6.783s
>> user    2m20.592s
>> sys     2m31.997s
>>
>> $ tail -15 LoggerWeakRefLeak.jdk7-B100+_60min.log
>> INFO: call count = 3502000
>> INFO: call count = 3503000
>> INFO: call count = 3504000
>> INFO: call count = 3505000
>> INFO: call count = 3506000
>> INFO: call count = 3507000
>> INFO: call count = 3508000
>> INFO: call count = 3509000
>> INFO: call count = 3510000
>> INFO: call count = 3511000
>> INFO: call count = 3512000
>> INFO: call count = 3513000
>> INFO: call count = 3514000
>> INFO: call count = 3515000
>> INFO: final loop count = 3515800
>>
>> *** (#2 of 3): 2010-06-29 01:46:49 GMT+00:00 daniel.daugherty at oracle.com
>>
>> I ran the AnonLoggerWeakRefLeak test with the same environment
>> variables for a 12 hour duration using the fixed bits from JPRT.
>> For this run I saved the "Total" line from the jmap output from
>> every 10th sample:
>>
>> $ diff ../AnonLoggerWeakRefLeak.sh AnonLoggerWeakRefLeak.sh
>> 161a162
>>
>>>>
>>>> > > "$TEST_NAME.totals"
>>>>
>>
>> 225a227,233
>>
>>>
>>> > > set +e
>>> > mod=`expr "$loop_cnt" % 10`
>>> > set -e
>>> > if [ "$mod" = 0 ]; then
>>> > tail -1 "$TEST_NAME.jmap" >> "$TEST_NAME.totals"
>>> > fi
>>>
>>
>> Here is an analysis of the .totals data:
>>
>> $ sh analyze_totals < AnonLoggerWeakRefLeak.totals
>>          #objs      #bytes
>> first:    30537     2243648
>> lo:       30537     2243648
>> hi:       57072     3197904
>> last:     35676     2882488
>> avg:      36853     2929982
>> # samples: 647
>>
>> The first sample is also the lowest set of values which isn't
>> a surprise given that the first sample is taken shortly after
>> the target program has started running. The hi value occurred
>> in sample #269 of 647 and the last sample was below the average.
>> This data indicates that the values are both rising and falling
>> over time which does not indicate any more memory leaks.
>>
>>
>>
>> I also ran the LoggerWeakRefLeak test with the same environment
>> variables for a 12 hour duration using the fixed bits from JPRT.
>> For this run I saved the "Total" line from the jmap output from
>> every 10th sample:
>>
>> $ diff ../LoggerWeakRefLeak.sh LoggerWeakRefLeak.sh
>> 161a162
>>
>>>>
>>>> > > "$TEST_NAME.totals"
>>>>
>>
>> 225a227,233
>>
>>>
>>> > > set +e
>>> > mod=`expr "$loop_cnt" % 10`
>>> > set -e
>>> > if [ "$mod" = 0 ]; then
>>> > tail -1 "$TEST_NAME.jmap" >> "$TEST_NAME.totals"
>>> > fi
>>>
>>
>> Here is an analysis of the .totals data:
>>
>> $ sh analyze_totals < LoggerWeakRefLeak.totals
>>          #objs      #bytes
>> first:    48957     2832648
>> lo:       48957     2832648
>> hi:      148299     6299016
>> last:    133173     5451032
>> avg:     137578     5608654
>> # samples: 647
>>
>> The first sample is also the lowest set of values which isn't
>> a surprise given that the first sample is taken shortly after
>> the target program has started running. The hi value occurred
>> in sample #76 of 647 and the last sample was below the average.
>> This data indicates that the values are both rising and falling
>> over time which does not indicate any more memory leaks.
>>
>> *** (#3 of 3): 2010-06-29 15:59:39 GMT+00:00 daniel.daugherty at oracle.com
>
>
> Please let me know if the fix is not working for you in your
> environment at Google.
>
> Dan
>
>
> On 6/23/2010 6:38 PM, Daniel D. Daugherty wrote:
>>
>> On 6/23/2010 12:38 PM, Jeremy Manson wrote:
>>>
>>> Hi Daniel,
>>>
>>> I'm sorry I missed this (I heavily filter these lists, and check rarely).
>>>
>>
>> This time I specifically left you on the "To" list rather than
>> editing down to just the list aliases.
>>
>>
>>> My main feeling is that you are missing a good bet by not
>>> reconstructing the Hashtable in LogManager and the ArrayList in Logger
>>> every so often when you remove the loggers.  In a test case where
>>> there are a LOT of short-lived loggers, the backing array for the
>>> Hashtable can get very big.  It is permanent, and doesn't go anywhere.
>>>  You can end up with a lot of extra memory lying around that way.
>>>
>>
>> It's possible that is a good bet. However, that wasn't mentioned
>> as one of the issues in either bug report (I think) so I didn't
>> write a test for that.
>>
>>
>>> Specifically, when I didn't reconstruct those data structures, the
>>> test case listed in the bug (where it just creates lots and lots of
>>> anonymous loggers) killed the Java instance with an OOM, even if I
>>> *did* clean up the weakreferences to the loggers.
>>>
>>
>> I'll create a variant of the anon logger test that I put in the
>> changeset and checkout if I can kill the Java instance with an OOM.
>> I'll keep you posted.
>>
>>
>>> I'm assuming you have a customer waiting for this - if that is similar
>>> to their usage pattern, this fix may not fix their problem.
>>>
>>
>> Thanks for the heads up. The JDK6 version of the fix hasn't been tested
>> by the customer yet so you might be right.
>>
>>
>>> You obviously don't want to rebuild those structures every time,
>>> though.  What I did in my change was to reconstruct the backing data
>>> structures every time ~as many loggers were collected as were present
>>> in the data structure.
>>>
>>
>> Yup. I caught that part of the rebuild algorithm. It's just that the
>> reason for doing the rebuild didn't jump out at me.
>>
>> Dan
>>
>>
>>> Jeremy
>>>
>>> On Fri, Jun 18, 2010 at 12:25 PM, Daniel D. Daugherty
>>> <daniel.daugherty at oracle.com> wrote:
>>>
>>>>
>>>> Greetings,
>>>>
>>>> I have a new version of my fix for the WeakReference leak in the
>>>> Logging API done. This version uses ReferenceQueues; thanks to Eamonn
>>>> McManus, Jeremy Manson and Tony Printezis for their insights on using
>>>> ReferenceQueues. Here's a pointer to Tony's paper for background info:
>>>>
>>>>   http://java.sun.com/developer/technicalArticles/javase/finalization/
>>>>
>>>> This version also has limits on the number of dead Loggers that are
>>>> cleaned up per call; thanks to Alan Bateman for politely pushing me in
>>>> that direction.
>>>>
>>>> The webrev is again relative to OpenJDK7, but the bug is escalated so
>>>> the fix will be backported to the JDK6-Update train. So again, I'll
>>>> need a minimum of two code reviewers.
>>>>
>>>> Here is the URL for the webrev:
>>>>
>>>>   http://cr.openjdk.java.net/~dcubed/6942989-webrev/1/
>>>>
>>>> Thanks, in advance, for any reviews.
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>
>


More information about the hotspot-runtime-dev mailing list