Soft References... are they working as intended?

Andreas Loew Andreas.Loew at oracle.com
Sat Aug 18 07:36:29 PDT 2012


Hi John,

while I am a "field guy" and therefore cannot really comment on any 
possible latest implementation details in JDK7, but from what I know 
about this topic, I can surely imagine that when in your sample, your 
SoftReferences start to make up a very large portion of the heap, this 
will cause the currently implemented mechanism to behave poorly and 
finally fail.

Please see mainly 
http://www.oracle.com/technetwork/java/hotspotfaq-138619.html and 
especially 
http://jeremymanson.blogspot.co.uk/2009/07/how-hotspot-decides-to-clear_07.html 
explaining all the nasty details - I'm going to partly cite both these 
sources below:


You do probably know about the "-XX" parameter:

-XX:SoftRefLRUPolicyMSPerMB=<value>

Every SoftReference has a timestamp field that is updated when it is 
accessed (when it is constructed or the get()) method is called. This 
gives a very coarse ordering over the SoftReferences; the timestamp 
indicates the time of the last GC before they were accessed.

Whenever a garbage collection occurs (and only then), the decision to 
clear a SoftReference is based on two factors:

 1. how old the reference's timestamp is, and
 2. how much free space there is in memory.

In my experience, this will coarsely mean that a soft reference will 
survive (after the last strong reference to the object has been 
collected!) for <value> milliseconds times the number of megabytes of 
current free space in the heap. The default is 1s/Mb, so if an object is 
only soft reachable it will stay for 1s if only 1Mb of heap space is 
free - provided that we have garbage collections run frequently enough 
to check this condition (!!!).

Also, the HotSpot Server VM uses the maximum possible heap size (as set 
with the |-Xmx| option) to calculate the current free space remaining, 
while the Client VM uses the current actual heap size to calculate the 
free space (!).

"This means that the general tendency is for the Server VM to grow the 
heap rather than flush soft references, and |-Xmx| therefore has a 
significant effect on when soft references are garbage collected. On the 
other hand, the Client VM will have a greater tendency to flush soft 
references rather than grow the heap."

And also - and this is probably which affects you most severely:

"One thing to notice about this is that it implies that SoftReferences 
will always be kept for at least one GC after their last access. Why is 
that? Well, for the interval, we are using the clock value of the last 
garbage collection, not the current one. As a result, if a SoftReference 
has been accessed since the last garbage collection, it will have the 
same timestamp as that garbage collection, and the interval will be 0. 0 
<= free_heap * 1000 for any amount of free_heap, so any SoftReference 
accessed since the last garbage collection is guaranteed to be kept."

The big hidden pitfall is that in case the objects being held via 
SoftReferences were too big to be allocated in the young generation 
(which, in my understanding, is true in your example), the above will 
not refer to the most recent minor GC, but to the most recent old gen, 
i.e. full GC that happened (!!!).


So in your sample case mentioned below, please check for the above 
conditions:

* What version of the JVM are you using?
* If using the server VM, do you use equal -Xms and -Xmx values?
* Are your "decoded JPEG images" directly being allocated into old 
generation (which I assume to be true)?
* And finally - looking at the general frequency of the appropriate type 
of GCs in your scenario, did you access the soft referenced objects 
since the last (in your scenario probably: full) GC when you see 
everything getting stuck or an OOME?

Hope this helps & best regards,

Andreas


Am 18.08.2012 14:13, schrieb Damon Hart-Davis:
> Hi,
>
> FWIW I usually combine SoftReferences with some other sort of explicit limit based on heap size to help avert this type of issue, and indeed use a number of different strategies, often involving some explicit LRU management.
>
> I can supply code snippets if that would help!  B^>
>
> Rgds
>
> Damon
>
>
> On 18 Aug 2012, at 13:06, John Hendrikx wrote:
>
>> I've come to the conclusion that SoftReferences in the current hotspot
>> implementation are suffering from some problems.
>>
>> I'm running the latest Java 7, with default gc settings and a very
>> modest heap space of 256 MB.
>>
>> On this heap I have on the order of 50-60 large objects that are
>> referenced by SoftReference objects.  Each object is a few megabytes in
>> size (they are decoded JPEG images).
>>
>> At any given time, only 10 of these images have strong references to
>> them, totalling no more than 50-60 MB of heap space, the other 200 MB of
>> space is only soft referenced.
>>
>> It is said that SoftReferences are guaranteed to get cleared before heap
>> space runs out, yet in certain extreme circumstances one of the
>> following can happen:
>>
>> 1) 90% of the time, when under high memory pressure (many images loaded
>> and discarded), the VM gets really slow and it seems that some threads
>> get stuck in an infinite loop.  What is actually happening is that the
>> GC will run for long periods in a row (upto a few minutes, consuming one
>> CPU core) before the program gets unstuck and it finally noticed it can
>> clear some SoftReference objects.
>>
>> It is possible that the GC has trouble deciding which SoftReferences can
>> be cleared because many of them had (upto a few seconds ago) strong
>> references to them, which themselves may not have been marked as garbage
>> yet.
>>
>> So it recovers, but it is taking so much time to do it that users will
>> think the program is stuck.
>>
>> 2) The rest of the time it actually will throw an out of heap space
>> exception, despite there being SoftReference objects that could have
>> been cleared.  This usually happens after a long pause as well.
>>
>> Can anyone confirm that these problems exists, and perhaps advice a
>> course of action?
>>
>> I really don't want to have to 2nd guess the GC about which images
>> should be discarded, but it looks like I will have no choice but to
>> limit this Image cache manually to some reasonable value to avoid the GC
>> getting stuck for long periods.
>>
>> Best regards,
>> John Hendrikx

-- 
Andreas Loew | Senior Java Architect
ACS Principal Service Delivery Engineer
ORACLE Germany



More information about the hotspot-gc-use mailing list