Java 7, soft references not being cleaned?

Peter B. Kessler Peter.B.Kessler at Oracle.COM
Fri Feb 17 18:08:39 PST 2012


I think you are on the edge of running out of memory.  Why did you limit the heap to 256MB?

Except for the frequency of the young generation collections, things go pretty well for the first 40 seconds.  Young generations are cleaning up pretty well, and not promoting that much to the old generation.  E.g., 

	39.943: [GC 39.943: [DefNew: 70189K->369K(78656K), 0.0030518 secs] 237834K->168023K(253440K), 0.0030990 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
	40.005: [GC 40.005: [DefNew: 70191K->369K(78656K), 0.0030995 secs] 237845K->168032K(253440K), 0.0031433 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

where the young generation collection at 40.005 turns 70191KB of young generation into 369KB of survivor objects in the young generation and 168032KB-168023KB=9KB of promoted objects in the old generation.  That pattern continues, piling up about 9KB in the old generation until the old generation fills up.  Actually there's a change in the pattern just before the old generation fills up

	40.329: [GC 40.329: [DefNew: 70198K->368K(78656K), 0.0029632 secs] 237896K->168075K(253440K), 0.0030054 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
	40.391: [GC 40.391: [DefNew: 70189K->368K(78656K), 0.0027960 secs] 237896K->168083K(253440K), 0.0028419 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
	40.453: [GC 40.453: [DefNew: 70189K->368K(78656K), 0.0030837 secs] 237904K->168092K(253440K), 0.0031416 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
	40.517: [GC 40.517: [DefNew: 70258K->439K(78656K), 0.0030203 secs] 237982K->168172K(253440K), 0.0030637 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
	40.596: [GC 40.596: [DefNew: 70162K->581K(78656K), 0.0030880 secs] 237895K->168322K(253440K), 0.0031284 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
	40.666: [GC 40.666: [DefNew: 70403K->581K(78656K), 0.0030961 secs] 238144K->168331K(253440K), 0.0031403 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
	[FINE] ItemsDao.getImage() - Loading image BACKGROUND(id=13)
	[FINE] ItemsDao.getImage() - Loading image POSTER(id=13)
	42.161: [GC 42.161: [DefNew: 70294K->6140K(78656K), 0.0157502 secs] 238044K->179973K(253440K), 0.0158060 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
	42.230: [GC 42.230: [DefNew: 75850K->75850K(78656K), 0.0000175 secs]42.230: [Tenured: 173833K->174734K(174784K), 0.1042122 secs] 249684K->179972K(253440K), [Perm : 17126K->17126K(17152K)], 0.1043190 secs] [Times: user=0.09 sys=0.00, real=0.11 secs]

The collections at 40.329, 40.391, 40.453 promote about 9KB each.  Then 40.517 promotes 80KB, 40.596 promotes 150KB, and then 40.666 settles back to 9KB.  I can't explain that hiccup.

At 42.161 things are completely different, in a bad way.  We leave 6140KB in the young generation survivor space, and promote 11642KB to the old generation.  (Okay, maybe that's the new image coming in.)  But then the old generation is full, so the young generation collection at 42.230 bails out (in 0.0000175 seconds!) and starts an old generation collection, which manages to make negative progress by promoting some objects to the old generation, and maybe collecting a few objects from the old generation, but ending up with more in the old generation than it started with (173833K->174734K(174784K)).  Things are going to go south from here.

We launch into a long series of full collections (because there isn't enough free space in the old generation for a young generation collection to think about starting).  Those collections, too, have a pattern

	42.379: [Full GC 42.379: [Tenured: 174734K->173476K(174784K), 0.1745277 secs] 253047K->178289K(253440K), [Perm : 17126K->17126K(17152K)], 0.1745924 secs] [Times: user=0.17 sys=0.00, real=0.18 secs]
	42.597: [Full GC 42.597: [Tenured: 174556K->174556K(174784K), 0.0899911 secs] 253024K->179018K(253440K), [Perm : 17126K->17126K(17152K)], 0.0900558 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
	42.737: [Full GC 42.737: [Tenured: 174556K->174556K(174784K), 0.0914405 secs] 253032K->179387K(253440K), [Perm : 17126K->17126K(17152K)], 0.0915065 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
	42.870: [Full GC 42.870: [Tenured: 174556K->174556K(174784K), 0.0859870 secs] 253043K->179396K(253440K), [Perm : 17126K->17126K(17152K)], 0.0860500 secs] [Times: user=0.08 sys=0.00, real=0.09 secs]
	42.998: [Full GC 42.998: [Tenured: 174556K->173476K(174784K), 0.0873824 secs] 253050K->178325K(253440K), [Perm : 17126K->17126K(17152K)], 0.0874467 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
	43.128: [Full GC 43.128: [Tenured: 174556K->174556K(174784K), 0.0861173 secs] 253062K->179055K(253440K), [Perm : 17126K->17126K(17152K)], 0.0861837 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
	43.267: [Full GC 43.267: [Tenured: 174556K->174556K(174784K), 0.0887080 secs] 253087K->179423K(253440K), [Perm : 17126K->17126K(17152K)], 0.0887732 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
	43.401: [Full GC 43.401: [Tenured: 174556K->174556K(174784K), 0.0862837 secs] 252860K->179431K(253440K), [Perm : 17126K->17126K(17152K)], 0.0863480 secs] [Times: user=0.08 sys=0.00, real=0.09 secs]
	43.530: [Full GC 43.530: [Tenured: 174556K->173476K(174784K), 0.0863131 secs] 253082K->178360K(253440K), [Perm : 17126K->17126K(17152K)], 0.0863782 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
	43.661: [Full GC 43.661: [Tenured: 174556K->174556K(174784K), 0.0861245 secs] 253101K->179091K(253440K), [Perm : 17126K->17126K(17152K)], 0.0861849 secs] [Times: user=0.08 sys=0.00, real=0.09 secs]
	43.788: [Full GC 43.788: [Tenured: 174556K->174556K(174784K), 0.0881231 secs] 253102K->179459K(253440K), [Perm : 17126K->17126K(17152K)], 0.0881831 secs] [Times: user=0.08 sys=0.00, real=0.09 secs]
	43.923: [Full GC 43.923: [Tenured: 174556K->174556K(174784K), 0.0863314 secs] 253107K->179467K(253440K), [Perm : 17126K->17126K(17152K)], 0.0863940 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
	44.050: [Full GC 44.050: [Tenured: 174556K->173476K(174784K), 0.0884058 secs] 253115K->178395K(253440K), [Perm : 17126K->17126K(17152K)], 0.0884718 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]

The first collection of the series manages to knock the old generation occupancy down to 173476KB, leaving 1208KB free.  The next three collections only knock the old generation occupancy down to 174556KB, leaving an even scantier 228KB free.  Then the cycle repeats: a little down, then 3 down hardly at all.  All those collections (except the first) start with an occupancy of 174556KB, and the groups of three end with an occupancy of 174556KB, so we aren't exactly out of memory, but we aren't making much progress.  Note that your code is running in there.  E.g., the collection at 42.998 takes 0.0874467 seconds, meaning it is finished at 43.085, so your code runs until the next collection at 43.128.  In that 0.042 seconds, you allocate your way through the remaining free space, and cause the next collection.  No one is happy about this situation.

Things continue badly for entirely too long, until

	75.670: [Full GC 75.670: [Tenured: 174555K->174555K(174784K), 0.0932953 secs] 252752K->181366K(253440K), [Perm : 17126K->17126K(17152K)], 0.0933557 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
	75.813: [Full GC 75.813: [Tenured: 174555K->174555K(174784K), 0.0967613 secs] 252756K->181370K(253440K), [Perm : 17126K->17126K(17152K)], 0.0968234 secs] [Times: user=0.09 sys=0.00, real=0.10 secs]
	75.964: [Full GC 75.964: [Tenured: 174555K->173241K(174784K), 0.1877438 secs] 252759K->180060K(253440K), [Perm : 17126K->17124K(17152K)], 0.1878076 secs] [Times: user=0.19 sys=0.00, real=0.19 secs]
	76.203: [Full GC 76.203: [Tenured: 174321K->174321K(174784K), 0.0906815 secs] 252550K->180604K(253440K), [Perm : 17124K->17124K(17152K)], 0.0907416 secs] [Times: user=0.08 sys=0.00, real=0.09 secs]
	76.355: [Full GC 76.355: [Tenured: 174321K->174321K(174784K), 0.0943489 secs] 252545K->181148K(253440K), [Perm : 17124K->17124K(17152K)], 0.0944085 secs] [Times: user=0.09 sys=0.00, real=0.10 secs]
	76.501: [Full GC 76.501: [Tenured: 174321K->174321K(174784K), 0.0958222 secs] 252683K->181152K(253440K), [Perm : 17124K->17124K(17152K)], 0.0958839 secs] [Times: user=0.08 sys=0.00, real=0.10 secs]
	76.645: [Full GC 76.645: [Tenured: 174321K->173194K(174784K), 0.1750156 secs] 252541K->180029K(253440K), [Perm : 17124K->17121K(17152K)], 0.1750760 secs] [Times: user=0.17 sys=0.00, real=0.17 secs]
	76.869: [Full GC 76.869: [Tenured: 174274K->174274K(174784K), 0.0912524 secs] 252500K->180573K(253440K), [Perm : 17121K->17121K(17152K)], 0.0913175 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
	77.008: [Full GC 77.008: [Tenured: 174274K->174274K(174784K), 0.0930352 secs] 252503K->181117K(253440K), [Perm : 17121K->17121K(17152K)], 0.0930973 secs] [Times: user=0.08 sys=0.00, real=0.09 secs]
	77.153: [Full GC 77.153: [Tenured: 174274K->174530K(174784K), 0.1806351 secs] 252506K->175070K(253440K), [Perm : 17121K->17121K(17152K)], 0.1806969 secs] [Times: user=0.19 sys=0.00, real=0.18 secs]
	77.398: [Full GC 77.398: [Tenured: 174530K->167748K(174784K), 0.1809714 secs] 252974K->167748K(253440K), [Perm : 17121K->17121K(17152K)], 0.1810366 secs] [Times: user=0.17 sys=0.00, real=0.18 secs]
	77.628: [GC 77.628: [DefNew: 69911K->545K(78656K), 0.0025767 secs] 237659K->168293K(253440K), 0.0026206 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
	77.679: [GC 77.679: [DefNew: 70310K->549K(78656K), 0.0027793 secs] 238058K->168297K(253440K), 0.0028211 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
	77.729: [GC 77.729: [DefNew: 70314K->553K(78656K), 0.0029654 secs] 238062K->168301K(253440K), 0.0030177 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]

we fight our way down to the collection at 77.398 when the jam breaks, we make a significant dent in the old generation occupancy, and young generation collections can run again.  Those young generation collections are leaving about 550KB in the young generation survivors, and promoting 4KB per collection.  The old generation is substantially full, but that's what memory is for.

After that, young generation collections continue at that frequency until

	95.566: [GC 95.566: [DefNew: 70364K->597K(78656K), 0.0030067 secs] 239595K->169831K(253440K), 0.0030526 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
	95.624: [GC 95.624: [DefNew: 70360K->597K(78656K), 0.0027606 secs] 239595K->169835K(253440K), 0.0027998 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
	95.681: [GC 95.681: [DefNew: 70362K->597K(78656K), 0.0029466 secs] 239600K->169839K(253440K), 0.0029982 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
	208.497: [GC 208.497: [DefNew: 70549K->926K(78656K), 0.0067911 secs] 239791K->170171K(253440K), 0.0068400 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
	518.900: [GC 518.900: [DefNew: 70878K->1267K(78656K), 0.0090123 secs] 240123K->170517K(253440K), 0.0090715 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
	540.704: [Full GC (System) 540.704: [Tenured: 169249K->35169K(174784K), 0.1363699 secs] 175870K->35169K(253440K), [Perm : 17122K->17122K(17152K)], 0.1364401 secs] [Times: user=0.13 sys=0.00, real=0.14 secs]

I assume your application went nearly idle between 95.681 and 540.704, because it was only allocating enough to cause the occasional young generation collections shown.  Then you used VisualVM to cause a collection, and that really made progress, reducing the old generation from 169249KB to 35169KB.

You never actually ran into an out of memory situation, because we were able to run collections and recover enough space to keep you limping along.  We were never compelled (by the JVM spec) to try clearing all your SoftReferences.  Absent running out of memory, SoftReferences are cleared if they haven't been used in a while.  I have no idea what the usage pattern is for your SoftReferences, but that might be why they stuck around.  If that's what happened.  Maybe the little progress we made during the series of full collections was cleaning up SoftReferences as they became eligible.  If you didn't use them in the apparently idle time leading up to the collection at 540.704, that might explain why we decided they could be cleared by the VisualVM collection.  Or it may be that VisualVM asks for a full collection that clears all SoftReferences.  (It can do that, but I forget the details.)  You can tune the urgency with which to collector tries to clear SoftReferences, but it's 
not for the faint of heart.

I suggest you try making your heap enough larger to allow your program to run without that series of old generation collections, watch how frequently you use your SoftReferences, and report back if things go better (or not).  If you made the young generation larger, the young generation collections would happen less often, though you don't seem to be complaining about the frequency or duration of the young generation pauses.

			... peter

P.S. Please try not to have your mailer wrap the GC log lines.  It makes them more difficult to analyze.

John Hendrikx wrote:
> I've written a fairly small program that runs in 256 MB -- this program 
> will load several large pictures, which are being "cached" using 
> SoftReferences.
> 
> The more of these pictures get loaded, the more the GC starts panicing 
> (while in reality, more than 200 MB in the VM is only held by 
> SoftReferences).
> 
> In the output below you can see the GC getting progressively longer 
> after each image is loaded (see Loading Image in log), doing more and 
> more GC action.  The performance of the program is severely impacted 
> (the CPU is maxed while this is happening).
> 
> To be sure that the references are REALLY softly reachable, I've used 
> VisualVM to trigger a Full GC, the result of which is visible on the 
> last line, copied here:
> 
> 540.704: [Full GC (System) 540.704: [Tenured: 169249K->35169K(174784K), 
> 0.1363699 secs] 175870K->35169K(253440K), [Perm : 
> 17122K->17122K(17152K)], 0.1364401 secs] [Times: user=0.13 sys=0.00, 
> real=0.14 secs]
> 
> Unfortunately, even AFTER this Full GC, the GC still goes crazy doing 
> much the same thing as when the images got loaded.  See the 2nd part of 
> the log.
> 
> This seems like a pretty serious issue as I was under the impression 
> that SoftReferences would be cleared in time to prevent long GC cycles.  
> I've tested this with 32-bit JDK 1.7u2 and the 1.7u4 developer preview 
> (which was said to have fixed some issues related to references).
> 
> I'll be happy to provide any more information you might need.
> --John Hendrikx
> 
> The output:
> 
> ....
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


More information about the hotspot-gc-use mailing list