Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

Neil Richards neil.richards at ngmr.net
Wed Mar 23 21:46:26 UTC 2011


Hi,
I've noticed that the fix introduced to address bug 6735255 [1] [2] had
an unfortunate side-effect.

That fix stores references to the InputStream objects returned from
ZipFile.getInputStream(ZipEntry) in a HashSet (within ZipFile), so that
the close() method may be called upon those objects when ZipFile.close()
is called.

It does this to conform to the existing API description, and to avoid a
native memory leak which would occur if the InputStream is GC'd without
its close() being called.

However, by holding these InputStreams in a set within their ZipFile
object, their lifecycle is now prolonged until the ZipFile object either
has its close() method called, or is finalized (prior to GC).

I've found scenarios (in user code) were ZipFile objects are held onto
for a long time (eg. the entire lifetime of the process), but where
InputStream objects from that ZipFile (for individual entries in the zip
file) are obtained, used, then abandoned on a large number of occasions.

(An example here might be a user-defined, long-lasting class loader,
which might retain a ZipFile instance for each jar file in its search
path, but which will create transitory input streams to read out
particular files from those (large) jar files). 

I see that the InputStream objects returned from
ZipFile.getInputStream(ZipEntry) tend to hold references to a couple of
sizeable byte arrays, one 512 bytes in size, the other 1002 bytes.

So by prolonging the life span of these objects, the fix for 6735255 can
inadvertently cause a significant increase in the demand/usage on the
heap (in practice, running into many Mb's).

(This is not so if the user code is good-mannered enough to explicitly
always call close() on the InputStreams in question.
However, experience shows that user code cannot be relied upon to behave
so benignly).

I believe this introduced problem can be addressed by:
     1. Holding references to the InputStreams in the 'streams' Set
        within ZipFile via WeakReferences, so they may be GC'd as soon
        as they are no longer externally (strongly) referenced.
     2. Adding finalization logic to the InputStream implementation
        classes, which ensures their close() method is called prior to
        GC.
     3. Prevent Inflater objects from being returned to the pool (in the
        ZipFile object) if close() has been called from finalize().

To that end, please find below an 'hg export' of a proposed change which
implements these aspects, together with a testcase to demonstrate the
problem/fix.

Any comments / suggestions on this gratefully received,
Thanks,
Neil

[1] http://bugs.sun.com/view_bug.do?bug_id=6735255
[2] http://hg.openjdk.java.net/jdk7/jdk7/jdk/rev/49478a651a28

-- 
Unless stated above:
IBM email: neil_richards at uk.ibm.com
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


# HG changeset patch
# User Neil Richards <neil.richards at ngmr.net>, <neil_richards at uk.ibm.com>
# Date 1300289208 0
# Branch zip-heap
# Node ID 34642c56c6febfd845c6c4ffd0344c3257f0b848
# Parent  554adcfb615e63e62af530b1c10fcf7813a75b26
Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd
Contributed-by: <neil.richards at ngmr.net>

diff -r 554adcfb615e -r 34642c56c6fe src/share/classes/java/util/zip/ZipFile.java
--- a/src/share/classes/java/util/zip/ZipFile.java	Wed Mar 16 15:01:07 2011 -0700
+++ b/src/share/classes/java/util/zip/ZipFile.java	Wed Mar 16 15:26:48 2011 +0000
@@ -30,11 +30,14 @@
 import java.io.IOException;
 import java.io.EOFException;
 import java.io.File;
+import java.lang.ref.ReferenceQueue;
+import java.lang.ref.WeakReference;
 import java.nio.charset.Charset;
 import java.util.Vector;
 import java.util.Enumeration;
 import java.util.Set;
 import java.util.HashSet;
+import java.util.Iterator;
 import java.util.NoSuchElementException;
 import java.security.AccessController;
 import sun.security.action.GetPropertyAction;
@@ -315,7 +318,16 @@
     private static native void freeEntry(long jzfile, long jzentry);
 
     // the outstanding inputstreams that need to be closed.
-    private Set<InputStream> streams = new HashSet<>();
+    private final Set<WeakReference<InputStream>> streams = new HashSet<>();
+    private final ReferenceQueue<? super InputStream> staleStreamQueue =
+        new ReferenceQueue<>();
+
+    private void clearStaleStreams() {
+        Object staleStream;
+        while (null != (staleStream = staleStreamQueue.poll())) {
+            streams.remove(staleStream);
+        }
+    }
 
     /**
      * Returns an input stream for reading the contents of the specified
@@ -339,6 +351,7 @@
         ZipFileInputStream in = null;
         synchronized (this) {
             ensureOpen();
+            clearStaleStreams();
             if (!zc.isUTF8() && (entry.flag & EFS) != 0) {
                 jzentry = getEntry(jzfile, zc.getBytesUTF8(entry.name), false);
             } else {
@@ -351,51 +364,19 @@
 
             switch (getEntryMethod(jzentry)) {
             case STORED:
-                streams.add(in);
+                streams.add(
+                        new WeakReference<InputStream>(in, staleStreamQueue));
                 return in;
             case DEFLATED:
-                final ZipFileInputStream zfin = in;
                 // MORE: Compute good size for inflater stream:
                 long size = getEntrySize(jzentry) + 2; // Inflater likes a bit of slack
                 if (size > 65536) size = 8192;
                 if (size <= 0) size = 4096;
-                InputStream is = new InflaterInputStream(zfin, getInflater(), (int)size) {
-                    private boolean isClosed = false;
-
-                    public void close() throws IOException {
-                        if (!isClosed) {
-                            super.close();
-                            releaseInflater(inf);
-                            isClosed = true;
-                        }
-                    }
-                    // Override fill() method to provide an extra "dummy" byte
-                    // at the end of the input stream. This is required when
-                    // using the "nowrap" Inflater option.
-                    protected void fill() throws IOException {
-                        if (eof) {
-                            throw new EOFException(
-                                "Unexpected end of ZLIB input stream");
-                        }
-                        len = this.in.read(buf, 0, buf.length);
-                        if (len == -1) {
-                            buf[0] = 0;
-                            len = 1;
-                            eof = true;
-                        }
-                        inf.setInput(buf, 0, len);
-                    }
-                    private boolean eof;
-
-                    public int available() throws IOException {
-                        if (isClosed)
-                            return 0;
-                        long avail = zfin.size() - inf.getBytesWritten();
-                        return avail > (long) Integer.MAX_VALUE ?
-                            Integer.MAX_VALUE : (int) avail;
-                    }
-                };
-                streams.add(is);
+                InputStream is = 
+                    new ZipFileInflaterInputStream(in, getInflater(), 
+                            (int)size);
+                streams.add(
+                        new WeakReference<InputStream>(is, staleStreamQueue));
                 return is;
             default:
                 throw new ZipException("invalid compression method");
@@ -403,6 +384,58 @@
         }
     }
 
+    private class ZipFileInflaterInputStream extends InflaterInputStream {
+        private boolean isClosed = false;
+        private boolean eof = false;
+        private final ZipFileInputStream zfin;
+        private boolean inFinalizer = false;
+
+        ZipFileInflaterInputStream(ZipFileInputStream zfin, Inflater inf,
+                int size) {
+            super(zfin, inf, size);
+            this.zfin = zfin;
+        }
+
+        public void close() throws IOException {
+            synchronized (ZipFile.this) {
+                if (!isClosed) {
+                    super.close();
+                    if (false == inFinalizer)
+                        releaseInflater(inf);
+                    isClosed = true;
+                }
+            }
+        }
+        // Override fill() method to provide an extra "dummy" byte
+        // at the end of the input stream. This is required when
+        // using the "nowrap" Inflater option.
+        protected void fill() throws IOException {
+            if (eof) {
+                throw new EOFException("Unexpected end of ZLIB input stream");
+            }
+            len = in.read(buf, 0, buf.length);
+            if (len == -1) {
+                buf[0] = 0;
+                len = 1;
+                eof = true;
+            }
+            inf.setInput(buf, 0, len);
+        }
+
+        public int available() throws IOException {
+            if (isClosed)
+                return 0;
+            long avail = zfin.size() - inf.getBytesWritten();
+            return (avail > (long) Integer.MAX_VALUE ? 
+                    Integer.MAX_VALUE : (int) avail);
+        }
+
+        protected void finalize() throws IOException {
+            inFinalizer = true;
+            close();
+        }
+    }
+
     /*
      * Gets an inflater from the list of available inflaters or allocates
      * a new one.
@@ -543,11 +576,14 @@
         synchronized (this) {
             closeRequested = true;
 
-            if (streams.size() !=0) {
-                Set<InputStream> copy = streams;
-                streams = new HashSet<>();
-                for (InputStream is: copy)
+            Iterator<WeakReference<InputStream>> streamsIterator = 
+                streams.iterator();
+            while (streamsIterator.hasNext()) {
+                InputStream is = streamsIterator.next().get();
+                if (null != is) {
                     is.close();
+                }
+                streamsIterator.remove();
             }
 
             if (jzfile != 0) {
@@ -684,9 +720,12 @@
                     freeEntry(ZipFile.this.jzfile, jzentry);
                     jzentry = 0;
                 }
-                streams.remove(this);
             }
         }
+
+        protected void finalize() {
+            close();
+        }
     }
 
 
diff -r 554adcfb615e -r 34642c56c6fe test/java/util/zip/ZipFile/ClearStaleZipFileInputStreams.java
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test/java/util/zip/ZipFile/ClearStaleZipFileInputStreams.java	Wed Mar 16 15:26:48 2011 +0000
@@ -0,0 +1,148 @@
+/*
+ * Copyright (c) 2011 Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ * 
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.
+ * 
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ * 
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ * 
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+/* 
+ * Portions Copyright (c) 2011 IBM Corporation 
+ */
+
+/*
+ * @test
+ * @bug %BUG%
+ * @summary Allow stale InputStreams from ZipFiles to be GC'd
+ * @author Neil Richards <neil.richards at ngmr.net>, <neil_richards at uk.ibm.com>
+ */
+import java.lang.ref.ReferenceQueue;
+import java.lang.ref.WeakReference;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.InputStream;
+import java.util.Enumeration;
+import java.util.HashSet;
+import java.util.Random;
+import java.util.Set;
+import java.util.zip.ZipEntry;
+import java.util.zip.ZipFile;
+import java.util.zip.ZipOutputStream;
+
+public class ClearStaleZipFileInputStreams {
+    private static final int ZIP_ENTRY_NUM = 5;
+
+    private static final byte[][] data;
+
+    static {
+        data = new byte[ZIP_ENTRY_NUM][];
+        Random r = new Random();
+        for (int i = 0; i < ZIP_ENTRY_NUM; i++) {
+            data[i] = new byte[1000];
+            r.nextBytes(data[i]);
+        }
+    }
+
+    private static File createTestFile(int compression) throws Exception {
+        File tempZipFile = 
+            File.createTempFile("test-data" + compression, ".zip");
+        tempZipFile.deleteOnExit();
+
+        ZipOutputStream zos = 
+            new ZipOutputStream(new FileOutputStream(tempZipFile));
+        zos.setLevel(compression);
+
+        try {
+            for (int i = 0; i < ZIP_ENTRY_NUM; i++) {
+                String text = "Entry" + i;
+                ZipEntry entry = new ZipEntry(text);
+                zos.putNextEntry(entry);
+                try {
+                    zos.write(data[i], 0, data[i].length);
+                } finally {
+                    zos.closeEntry();
+                }
+            }
+        } finally {
+            zos.close();
+        }
+
+        return tempZipFile;
+    }
+
+    private static void startGcInducingThread(final int sleepMillis) {
+        final Thread gcInducingThread = new Thread() {
+            public void run() {
+                while (true) {
+                    System.gc();
+                    try {
+                        Thread.sleep(sleepMillis);
+                    } catch (InterruptedException e) { }
+                }
+            }
+        };
+
+        gcInducingThread.setDaemon(true);
+        gcInducingThread.start();
+    }
+
+    public static void main(String[] args) throws Exception {
+        startGcInducingThread(500);
+        runTest(ZipOutputStream.DEFLATED);
+        runTest(ZipOutputStream.STORED);
+    }
+
+    private static void runTest(int compression) throws Exception {
+        ReferenceQueue<InputStream> rq = new ReferenceQueue<>();
+        
+        System.out.println("Testing with a zip file with compression level = "
+                + compression);
+        File f = createTestFile(compression);
+        try {
+            ZipFile zf = new ZipFile(f);
+            try {
+                Set<Object> refSet = createTransientInputStreams(zf, rq);
+
+                System.out.println("Waiting for 'stale' input streams from ZipFile to be GC'd ...");
+                System.out.println("(The test will hang on failure)");
+                while (false == refSet.isEmpty()) {
+                    refSet.remove(rq.remove());
+                }
+                System.out.println("Test PASSED.");
+                System.out.println();
+            } finally {
+                zf.close();
+            }
+        } finally {
+            f.close();
+        }
+    }
+
+    private static Set<Object> createTransientInputStreams(ZipFile zf,
+            ReferenceQueue<InputStream> rq) throws Exception {
+        Enumeration<? extends ZipEntry> zfe = zf.entries();
+        Set<Object> refSet = new HashSet<>();
+
+        while (zfe.hasMoreElements()) {
+            InputStream is = zf.getInputStream(zfe.nextElement());
+            refSet.add(new WeakReference<InputStream>(is, rq));
+        }
+
+        return refSet;
+    }
+}





More information about the core-libs-dev mailing list