RFR [7021870] GzipInputStream closes underlying stream during reading

Tue May 14 05:33:12 UTC 2013

Ivan,

It's a fair assessment to say "ZIS.available() is not the only cause..." though
I don't think it's that easy to really "craft" a real test case to demonstrate the
failure with a "malformed" gzip entry inside a zip file. At least I don't think
I can create one without sitting down with some real calculation:-) Or, I'm
missing something in your test case? It appears you will have to have the
inflater of the GZIPInputStream to happen to have only 8 bytes left in its
input buffer as the "remaining" when it is done inflating the gzip stream,
otherwise the this.in.available() will never > 0 (in case of < 8 byte, the trailer
reading will trigger the ZIS to "fill", in which the fixed/non-customizable 512
buffer size of ZIS will guarantee to exhaust the whole entry bytes and result
in the this.is.available() to return 0; if n > 8, means all bytes, including the
malformed byte(s) attached, have been in the GZIS.inf's remaining buffer,
so the this.is.available() will now returns 0 as well) which also means you
need the ZipInputStream "happens" to be filled and inflated all bytes of the
"gzip-streamed" entry, except the last one byte (malformed), in its last
filling/inflating...

It's also arguable that we are now discussing a "malformed" gzip entry inside
a zip stream, which is not exactly the original "regression" that we are trying
to address... But, given the current GZIS implementation ignores this type of
"malformed" stream (un-necessarily attached bytes at the end), it's reasonable
to request that reading from such an gzip-streamed entry of a zip entry should
succeed as well. And the proposed/pushed fix might be the easiest approach
to solve this "malformed gzip stream inside a zip entry" corner case for now.

So I would suggest to consider combining the existing fix + the available()
change (I would need a deep look to see if it has any "compatibility" impact...)

-Sherman

On 05/13/2013 06:35 AM, Ivan Gerasimov wrote:
> A small correction to the recipe for failure.
> The failure is reproduced when a SINGLE byte is appended to the gzip archive.
> In the scenario below the echo command appends TWO bytes. These two bytes are interpreted as a wrong gzip header and get successfully ignored.
>
> On 13.05.2013 16:23, Ivan Gerasimov wrote:
>> Hello Sherman!
>>
>> Thank you for the review!
>> While I agree with you that current implementation of ZipInputStream.available() is not what it's supposed to be, I doubt it is the only cause of the issue.
>> Please consider the following example.
>> Zip archive (z.zip) consists of two entries: f1 and f2.
>> f1 - gzip archive concatenated with a couple of bytes (I understand it's malformed gzip archive),
>> f2 - let it be a single byte text file.
>>
>> $ echo a > f1
>> $ echo b > f2
>> $ gzip f1
>> $ echo a >> f1.gz
>> $ zip z.zip f1.gz f2
>>
>> public class ZipMix {
>>     private static byte[] rbuf = new byte[1];
>>     public static void main(String[] args) throws Throwable {
>>         FileInputStream fis = new FileInputStream("z.zip");
>>         ZipInputStream zis = new ZipInputStream(fis);
>>         System.out.print("-start-\n");
>>         if (zis.getNextEntry() != null) {
>>             InputStream gis = new GZIPInputStream(zis, 8192);
>>             while (gis.read(rbuf) != -1)
>>                 System.out.print(new String(rbuf));
>>         }
>>         if (zis.getNextEntry() != null) { // <--- throws IOException
>>             while (zis.read(rbuf) != -1)
>>                 System.out.print(new String(rbuf));
>>         }
>>         System.out.print("\n-finish-\n");
>>     }
>> }
>>
>> This code worked well with jdk1.6.20, since GZIPInputStream hadn't tried to read beyond the trailer.
>> This breaks with the current jdk, since GZIPInputStream tries to read yet another gzip header after the first trailer, and SequenceInputStream.read() calls the close() method for underlying stream.
>> And this would break even with the ZipInputStream.available() fixed.
>>
>> It seems to me that the root cause here is using SequenceInputStream, which can close the stream during the read() call.
>> And this is what my fix was about - to prevent closing of the underlying stream.
>>
>> Sincerely yours,
>> Ivan
>>
>>
>> On 11.05.2013 2:48, Xueming Shen wrote:
>>> The proposed fix does solve the issue.
>>>
>>> However it appears it does not fix the root cause that triggers this problem.
>>> The root cause, or the direct trigger of this bug is the fact that ZipInputStream
>>> .availble() is really NOT reliable to tell us whether or not there is any byte
>>> "available" to be read from the input stream. The api doc of the ZIS.available()
>>> specifies "Returns 0 after EOF has reached for the current entry data, otherwise
>>> always return 1.". The implementation is straightforward,
>>>
>>>     public int available() throws IOException {
>>>         if (entryEOF) {
>>>             return 0;
>>>         } else {
>>>             return 1;
>>>         }
>>>     }
>>>
>>> However, the flag "entryEOF" is only set if a read attempt has been tried
>>> and failed (see ZIS.read()), which means if a previous read of the entry
>>> succeeded and it actually read all the available bytes from the entry (with
>>> a big enough buffer), there is 0 byte available for read, the "flag" is not
>>> set, so if you invoke zis.available(), it return 1, but a read() invocation will
>>> returns -1 (yes, if you then try zis.available() again , it returns the expected
>>> 0). The test below demonstrates this issue.
>>>
>>>    public static void main(String args[]) throws IOException {
>>>        ZipInputStream zis = new ZipInputStream(new FileInputStream(args[0]));
>>>        ZipEntry ze;
>>>        while((ze = zis.getNextEntry()) !=null) {
>>>            System.out.println("--------------" + ze.getName() + "--------");
>>>            byte[] buf = new byte[8102];
>>>
>>>            while (true) {
>>>                System.out.println("    zis.available()=" + zis.available());
>>>                int n = zis.read(buf, 0, buf.length);
>>>                System.out.println("    readN=" + n);
>>>                if (n == -1)
>>>                    break;
>>>            }
>>>        }
>>>
>>>     }
>>>
>>> It is arguable that the this is not an implementation bug, since it is reasonable
>>> to argue that the EOF has not been "really" reached yet after the first try, the
>>> implementation happens to return all available bytes, so the next read try finally
>>> "reached" the EOF, but this obviously is not the expectation.
>>>
>>> Unfortunately the fix we put in for #4691425 [1] logically depends on
>>> in.available() to decide whether or not we need to read a little further, which
>>> directly triggers this bug when the ZIS.available() "lies".
>>>
>>>         // If there are more bytes available in "in" or
>>>         // the leftover in the "inf" is > 26 bytes:
>>>         // this.trailer(8) + next.header.min(10) + next.trailer(8)
>>>         // try concatenated case
>>>         if (this.in.available() > 0 || n > 26) {
>>>             ....
>>>         }
>>>
>>> Not only ZInputStream has this issue, its parent class InflaterStream (and
>>> its sibling GZIPInputStream, GZS inherits InflaterStream's available()
>>> implementation directly) has the same issue, I do have a bug#7031075
>>> filed against GZIPInputStream.
>>>
>>> So the proposed fix is more a workaround for this available() issue. The
>>> alternative is to fix the issue directly, for example, to change the ZIS's
>>> available implementation to something like
>>>
>>>     public int available() throws IOException {
>>>         ensureOpen();
>>>         if (entryEOF || entry == null)
>>>             return 0;
>>>         switch (entry.method) {
>>>         case DEFLATED:
>>>             return (inf.finished() || inf.needsDictionary()) ? 0 : 1;
>>>         case STORED:
>>>             return remaining > 0 ? 1 : 0;
>>>         default:
>>>             throw new ZipException("invalid compression method");
>>>         }
>>>     }
>>>
>>> we probably should go further to simply remove the flag "entryEOF"
>>> and move the "deflated" case implementation into InflaterInputStream
>>> (to fix 7031075 as well).
>>>
>>> -Sherman
>>>
>>>
>>> [1] http://cr.openjdk.java.net/~sherman/4691425/webrev/
>>> [2] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7031075
>>>
>>> On 05/10/2013 01:03 AM, Ivan Gerasimov wrote:
>>>> Hello everybody!
>>>>
>>>> GzipInputStream uses SequenceInputStream to concatenate the underlying stream with some data that reside in memory.
>>>> SequenceInputStream is implementing in such a way that it closes the stream when it reaches EOS.
>>>>
>>>> The solution is to wrap the underlying stream with extended FilterInputStream that overrides the close() method.
>>>>
>>>> BUG: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7021870
>>>> WEBREV: http://cr.openjdk.java.net/~dmeetry/7021870/webrev.0/ <http://cr.openjdk.java.net/%7Edmeetry/7021870/webrev.0/>
>>>>
>>>> Sincerely your,
>>>> Ivan
>>>>
>>>
>>>
>>>
>>
>>
>>
>