RFR: 8156484: ZipFile retains too much native memory from caching Inflaters

Peter Levart peter.levart at gmail.com
Mon May 23 18:00:38 UTC 2016


Hi Sherman,

On 05/23/2016 05:44 PM, Xueming Shen wrote:
> It appears the scope of the change is bigger than the problem we are 
> trying to solve here.

The ClassLoader(s)/Resource changes are there just to measure the impact 
if input streams opened from a jar file are closed explicitly as soon as 
possible when loading classes. This makes the cache/pool of Inflater's 
native resources more efficient. Those changes are not meant to be 
included in this patch - will be stripped off and proposed as a separate 
enhancement later.

There are also cleanups that are not absolutely necessary such as 
InflaterInputStream's package-private constructor with trusted 
subclasses using it so that usesDefaultInflater flag can be private and 
final in InflaterInputStream. They can be reverted if you don't like them.

>
> The problem we are having here is that in certain use scenario the vm 
> keeps a huge number
> of zip/jar files open and in which makes even the minimal cache 
> mechanism (keep even one
> deflater for one zip file) not desirable.

Right. Even if vm keeps huge number of jar files open, it doesn't read 
from all of them at the same time, but typically a single thread only 
reads from one entry of one jar file at a time before going to the next 
one. So keeping an isolated cache of native resource that could be 
re-used to read from multiple jar files, per jar file, is not the most 
efficient way to cache it. It doesn't matter until the number of jar 
files is large and VMs with gigantic classpaths are an important use case.

>
> It might be desirable to have a better cache mechanism, such as the 
> "ObjectPool" approach you're
> proposing here, to help in this situation. But I doubt if it is 
> necessary to extend the "functionality"
> to the rest of the package. For example, In most use scenario, we 
> don't have such cache issue with
> the deflater/inflater, a usual "open-end+finalize()" mechanism appears 
> just fine for now, I don't
> see a compelling reason to replace the existing mechanism with a new 
> one. It might be better
> to be conservative here to not make big change to something not proven 
> broken.

As  presented in my 1st message in the thread, what Martin is seeing is 
a combination of several issues:

- class loading is not closing streams explicitly
- caching is not effective for Inflaters for which the streams are 
closed with finalizers because when finalizer invokes close() on stream, 
the WeakHashMap<InputStream, Inflater> is not keeping the InputStream 
any more, so the associated Inflater, while maybe still reachable in 
object graph, can not be retrieved (map.remove(stream) returns null).
- caching of Inflaters for streams that happen to be explicitly closed 
is not bounded so they may pile-up.

I claim that we must solve all 3 issues to make Martin happy again. :-)

>
> It does not seem really necessary to have Deflater to be aware of the 
> "ObjectPool" and the cleaner/
> dealloctor, maybe all of these should be kept as an implementation 
> inside the ObjectPool?  which
> can have a cleaner to just invoke deflater.end() when needed?

Deflater is not aware of ObjectPool. It just uses Cleaner instead of 
finalize() because as a twin brother of Inflater, it would be nice for 
it to have the same cleanup mechanism. Cleaner is more efficient than 
finalization when the Deflater is explicitly end()-ed which is also a 
nice property. Otherwise the mechanism of Cleaner is similar to 
finalization. Cleaner is using PhantomReference(s) which are cleared 
when discovered and enqueued, while finalization is using 
FinalReference(s) which are not cleared and so present additional burden 
to CPU even in cases where Deflater is not explicitly end()ed. But if 
you don't like that, the changes to Deflater could be reverted back to 
using finalize().

Even Inflater could use finalize() for what it is worth, combined with 
ObjectPool<ZStreamRef>, but why would we not take an opportunity and 
replace finalize() as we go? It's plain and simple. And Cleaner API was 
developed for just that purpose.

>
> -Sherman

Regards, Peter

>
>
>
> On 5/23/16 7:56 AM, Peter Levart wrote:
>> Hi Martin,
>>
>>
>> On 05/20/2016 08:53 PM, Martin Buchholz wrote:
>>> Peter,
>>>
>>> You keep impressing me with your development speed!
>>> Looking at ObjectPool ... looking pretty good
>>
>> Thank
>>
>>>
>>> keepalivetime should be > 0.
>>
>> Right. ScheduledThreadPoolExecutor.scheduleWithFixedDelay throws 
>> IllegalArgumentException when the 'delay' argument is not positive. 
>> ObjectPool should have the same check.
>>
>>> (also very small keepalive times are a
>>> bad idea, but what's "very small"?)
>>> I'm thinking one second for ZipFile keep alive time.  Thread pools
>>> have 60 seconds.
>>
>> Exactly. I set 1 second of keep-alive for the two pools in Inflater.
>>
>>>
>>> Maybe we need a Consumer<T> resetter in addition to a releaser.
>>
>> I've been thinking of that too. It makes the usages nicer as they 
>> don't have to deal with resetting the object before returning it to 
>> the pool. In addition, the pool could reset the object lazily just 
>> before handing it out again. The "resetter" could also serve as an 
>> object "validator" - either being a Predicate or a Consumer throwing 
>> an exception. Thinking of objects like Connection(s). In case 
>> validation fails, the pool would release the object and try another one.
>>
>>>
>>> I'm opposed to creating a new STPE.  jdk 9 CompletableFuture comes
>>> with a hidden thread scheduler, so the idea is to use delayedExecutor
>>> to share that one thread for all common scheduling.
>>
>> Nice. It even simplifies the ObjectPool that way. I peeked at code 
>> and I see it uses an internal shared STPE under the hood.
>>
>>>
>>> I think you want to call pop poll instead, since it can return null?
>>
>> That's better yes.
>>
>>>
>>> We want a purger task to be created at most once per keep alive
>>> period, and never after the pool is quiescent.  Not sure if we're
>>> there yet.
>>
>> The task itself can be reused, yes. I wanted to respect the 
>> keepAliveTime so that pooled objects are released as soon as 
>> keepAliveTime is reached after last object was returned to the pool. 
>> I also wanted, as optimization, to schedule a task for periodic 
>> invocation, which only guarantees that objects are released sometime 
>> between keepAliveTime ... 2 * keepAliveTime after the last object is 
>> returned to the pool. But then I realized that periodic scheduling 
>> with constant delay is just a fancy API facade for scheduling next 
>> invocation at the end of the previous one. In either case the 
>> internal "heap" queue has to be updated at every firing of the task.
>>
>> So this is much simpler now with CompletableFuture:
>>
>> http://cr.openjdk.java.net/~plevart/jdk9-dev/ZipFile_Inflater.cleanup/webrev.03/ 
>>
>>
>> The only change from webrev.02 is in ObjectPool implementation.
>>
>> java/util/zip tests pass with this patch. The TestZipFile.java was 
>> failing for me with OOME and I had to increase the heap to 4G to make 
>> it pass. Heap dump shows that the test really needs that much. It 
>> keeps ZipEntries arround in a LinkedHashMap with associated byte[] 
>> arrays holding the contents of the zip-ed files.
>>
>> So, Martin, what do you think of this one?
>>
>> Regards, Peter
>>
>>>
>>> On Fri, May 20, 2016 at 7:31 AM, Peter Levart 
>>> <peter.levart at gmail.com> wrote:
>>>> Hi Martin,
>>>>
>>>>
>>>>
>>>> On 05/20/2016 02:51 AM, Martin Buchholz wrote:
>>>>> On Thu, May 19, 2016 at 7:29 AM, Peter Levart 
>>>>> <peter.levart at gmail.com>
>>>>> wrote:
>>>>>> So Martin, what do you think?
>>>>> I studied your code and we are fundamentally in agreement!
>>>>> ... Except for need for periodic cleanup of the cache...
>>>>> Maybe we should do
>>>>> CompletableFuture.runAsync(purgeCache,
>>>>> CompletableFuture.delayedExecutor(1, SECONDS))
>>>>> as a periodic task while cache is non-empty instead of my strategy of
>>>>> relying on a weak reference.
>>>>>
>>>>> Purging a cache on quiescence in the optimal way seems to be a hard
>>>>> problem.
>>>>> Time to write ObjectPool<T> (as many others have done before us)?
>>>>>
>>>>> ObjectPool(Supplier<T> factory,  Consumer<T> releaser, maxCacheSize,
>>>>> keepAliveTime)
>>>>>
>>>>> Except this time we'll do it right!  With jdk9 machinery!
>>>>
>>>> Something like the following?
>>>>
>>>> http://cr.openjdk.java.net/~plevart/jdk9-dev/ZipFile_Inflater.cleanup/webrev.02/ 
>>>>
>>>>
>>>> I still must debug the cause of OutOfMemoryError in
>>>> java/util/zip/ZipFile/TestZipFile.java.
>>>>
>>>> Regards, Peter
>>>>
>>
>




More information about the core-libs-dev mailing list