Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Xueming Shen
xueming.shen at oracle.com
Tue Mar 29 19:05:12 UTC 2011
Hi Neil,
It appears to be a "regression" in scenario you described (user
application never close the
input stream after use and the ZipFile instance being retained during
the lifetime of the
process). The proposed approach seems to solve this particular problem.
Here are my comments regarding the patch.
(1) ZipFileInflaterInputStream.close() now synchronizes on Zipfile.this,
is it really necessary?
given the rest of the implementation doesn't guarantee the
returned InputStream object
is thread-safe (the implementation does make sure the access to
the underlying
ZipFile is thread-safe, though). The new finalize() now also
invokes close(), which means
the ZipFile object now gets locked twice (assume the good manner
application does
invoke the close() after use) for each InputStream after use.
(2) Is there any particular reason that we don't want to "reuse" the
Inflater(), if the close
comes from the finalize() in your testing evn? One thing we
noticed before is that the
moment the InputStream gets finalized, the Inflater embedded
might have been
finalized already, this is the reason why we have a "inf.ended()"
check in releaseInflater().
The disadvantage/limitation of existing Inflater caching mechanism
is that it does not
have a "cap" to limit the maximum number of inflater it should
cache. Are you worried
too many inflaters get cached, if they only get collected during
GC/finalize()? This actually
indicates that the cache mechanism does not work at all in your
scenario, even with the
proposed fix. An alternative solution (better?) is to make ZFIIS
to check the bytes read
during its read() method, and pro-actively close the stream when
it reaches the end of
the stream, same as what ZipFileInputStream does, which I believe
should solve the
problem in most use scenario (except, the application does not
bother reading the
InputSteeam to its end, which is unlikely).
(3) The attached test case does "f.close()" in its finally block, I
guess it should be f.delete()?
-Sherman
On 03/23/2011 02:46 PM, Neil Richards wrote:
> Hi,
> I've noticed that the fix introduced to address bug 6735255 [1] [2] had
> an unfortunate side-effect.
>
> That fix stores references to the InputStream objects returned from
> ZipFile.getInputStream(ZipEntry) in a HashSet (within ZipFile), so that
> the close() method may be called upon those objects when ZipFile.close()
> is called.
>
> It does this to conform to the existing API description, and to avoid a
> native memory leak which would occur if the InputStream is GC'd without
> its close() being called.
>
> However, by holding these InputStreams in a set within their ZipFile
> object, their lifecycle is now prolonged until the ZipFile object either
> has its close() method called, or is finalized (prior to GC).
>
> I've found scenarios (in user code) were ZipFile objects are held onto
> for a long time (eg. the entire lifetime of the process), but where
> InputStream objects from that ZipFile (for individual entries in the zip
> file) are obtained, used, then abandoned on a large number of occasions.
>
> (An example here might be a user-defined, long-lasting class loader,
> which might retain a ZipFile instance for each jar file in its search
> path, but which will create transitory input streams to read out
> particular files from those (large) jar files).
>
> I see that the InputStream objects returned from
> ZipFile.getInputStream(ZipEntry) tend to hold references to a couple of
> sizeable byte arrays, one 512 bytes in size, the other 1002 bytes.
>
> So by prolonging the life span of these objects, the fix for 6735255 can
> inadvertently cause a significant increase in the demand/usage on the
> heap (in practice, running into many Mb's).
>
> (This is not so if the user code is good-mannered enough to explicitly
> always call close() on the InputStreams in question.
> However, experience shows that user code cannot be relied upon to behave
> so benignly).
>
> I believe this introduced problem can be addressed by:
> 1. Holding references to the InputStreams in the 'streams' Set
> within ZipFile via WeakReferences, so they may be GC'd as soon
> as they are no longer externally (strongly) referenced.
> 2. Adding finalization logic to the InputStream implementation
> classes, which ensures their close() method is called prior to
> GC.
> 3. Prevent Inflater objects from being returned to the pool (in the
> ZipFile object) if close() has been called from finalize().
>
> To that end, please find below an 'hg export' of a proposed change which
> implements these aspects, together with a testcase to demonstrate the
> problem/fix.
>
> Any comments / suggestions on this gratefully received,
> Thanks,
> Neil
>
> [1] http://bugs.sun.com/view_bug.do?bug_id=6735255
> [2] http://hg.openjdk.java.net/jdk7/jdk7/jdk/rev/49478a651a28
>
More information about the core-libs-dev
mailing list