RFR [7021870] GzipInputStream closes underlying stream during reading

Ivan Gerasimov ivan.gerasimov at oracle.com
Tue Jun 18 16:29:04 UTC 2013


Hello!

I suggest to extend the test to cover the case with malformed input.
Would you please review the suggested change?
http://cr.openjdk.java.net/~igerasim/7021870/0/webrev/

I haven't created the bug id for the change yet.
Will create one if you agree that the change is reasonable.

The main reason to extend the test is to cover the situation when 
ZIS.available() works correctly, but it is not sufficient to prevent the 
underlying stream from been closed.

Sincerely yours,
Ivan

On 15.05.2013 0:32, Ivan Gerasimov wrote:
> Sherman,
>
> Thank you for a detailed explanation.
>
>> It's a fair assessment to say "ZIS.available() is not the only 
>> cause..." though
>> I don't think it's that easy to really "craft" a real test case to 
>> demonstrate the
>> failure with a "malformed" gzip entry inside a zip file.
> I think I found an easy way to do that.
> It's sufficient to create GZIPInputStream with a small buffer of 4 bytes.
> I've traced the code and found that at the time available() is called 
> there was
> exactly 1 byte left in the input stream. Thus it would have returned 
> true even
> if it were implemented more correctly.
>
> What do you think, is it worth adding it to the existing test?
>
>> At least I don't think
>> I can create one without sitting down with some real calculation:-) 
>> Or, I'm
>> missing something in your test case? It appears you will have to have 
>> the
>> inflater of the GZIPInputStream to happen to have only 8 bytes left 
>> in its
>> input buffer as the "remaining" when it is done inflating the gzip 
>> stream,
>> otherwise the this.in.available() will never > 0 (in case of < 8 
>> byte, the trailer
>> reading will trigger the ZIS to "fill", in which the 
>> fixed/non-customizable 512
>> buffer size of ZIS will guarantee to exhaust the whole entry bytes 
>> and result
>> in the this.is.available() to return 0; if n > 8, means all bytes, 
>> including the
>> malformed byte(s) attached, have been in the GZIS.inf's remaining 
>> buffer,
>> so the this.is.available() will now returns 0 as well) which also 
>> means you
>> need the ZipInputStream "happens" to be filled and inflated all bytes 
>> of the
>> "gzip-streamed" entry, except the last one byte (malformed), in its last
>> filling/inflating...
>>
>> It's also arguable that we are now discussing a "malformed" gzip 
>> entry inside
>> a zip stream, which is not exactly the original "regression" that we 
>> are trying
>> to address... 
> Well, my point is that the reported problem was that "GzipInputStream 
> closes
> underlying stream during reading". Having a malformed input with a 
> garbage at
> the end of the gziped entry was just another case that can trigger 
> this problem
> even with fixed ZIS.available().
>
> Sincerely,
> Ivan
>
>> But, given the current GZIS implementation ignores this type of
>> "malformed" stream (un-necessarily attached bytes at the end), it's 
>> reasonable
>> to request that reading from such an gzip-streamed entry of a zip 
>> entry should
>> succeed as well. And the proposed/pushed fix might be the easiest 
>> approach
>> to solve this "malformed gzip stream inside a zip entry" corner case 
>> for now.
>>
>> So I would suggest to consider combining the existing fix + the 
>> available()
>> change (I would need a deep look to see if it has any "compatibility" 
>> impact...)
>>
>> -Sherman
>>
>>
>> On 05/13/2013 06:35 AM, Ivan Gerasimov wrote:
>>> A small correction to the recipe for failure.
>>> The failure is reproduced when a SINGLE byte is appended to the gzip 
>>> archive.
>>> In the scenario below the echo command appends TWO bytes. These two 
>>> bytes are interpreted as a wrong gzip header and get successfully 
>>> ignored.
>>>
>>> On 13.05.2013 16:23, Ivan Gerasimov wrote:
>>>> Hello Sherman!
>>>>
>>>> Thank you for the review!
>>>> While I agree with you that current implementation of 
>>>> ZipInputStream.available() is not what it's supposed to be, I doubt 
>>>> it is the only cause of the issue.
>>>> Please consider the following example.
>>>> Zip archive (z.zip) consists of two entries: f1 and f2.
>>>> f1 - gzip archive concatenated with a couple of bytes (I understand 
>>>> it's malformed gzip archive),
>>>> f2 - let it be a single byte text file.
>>>>
>>>> $ echo a > f1
>>>> $ echo b > f2
>>>> $ gzip f1
>>>> $ echo a >> f1.gz
>>>> $ zip z.zip f1.gz f2
>>>>
>>>> public class ZipMix {
>>>>     private static byte[] rbuf = new byte[1];
>>>>     public static void main(String[] args) throws Throwable {
>>>>         FileInputStream fis = new FileInputStream("z.zip");
>>>>         ZipInputStream zis = new ZipInputStream(fis);
>>>>         System.out.print("-start-\n");
>>>>         if (zis.getNextEntry() != null) {
>>>>             InputStream gis = new GZIPInputStream(zis, 8192);
>>>>             while (gis.read(rbuf) != -1)
>>>>                 System.out.print(new String(rbuf));
>>>>         }
>>>>         if (zis.getNextEntry() != null) { // <--- throws IOException
>>>>             while (zis.read(rbuf) != -1)
>>>>                 System.out.print(new String(rbuf));
>>>>         }
>>>>         System.out.print("\n-finish-\n");
>>>>     }
>>>> }
>>>>
>>>> This code worked well with jdk1.6.20, since GZIPInputStream hadn't 
>>>> tried to read beyond the trailer.
>>>> This breaks with the current jdk, since GZIPInputStream tries to 
>>>> read yet another gzip header after the first trailer, and 
>>>> SequenceInputStream.read() calls the close() method for underlying 
>>>> stream.
>>>> And this would break even with the ZipInputStream.available() fixed.
>>>>
>>>> It seems to me that the root cause here is using 
>>>> SequenceInputStream, which can close the stream during the read() 
>>>> call.
>>>> And this is what my fix was about - to prevent closing of the 
>>>> underlying stream.
>>>>
>>>> Sincerely yours,
>>>> Ivan
>>>>
>>>>
>>>> On 11.05.2013 2:48, Xueming Shen wrote:
>>>>> The proposed fix does solve the issue.
>>>>>
>>>>> However it appears it does not fix the root cause that triggers 
>>>>> this problem.
>>>>> The root cause, or the direct trigger of this bug is the fact that 
>>>>> ZipInputStream
>>>>> .availble() is really NOT reliable to tell us whether or not there 
>>>>> is any byte
>>>>> "available" to be read from the input stream. The api doc of the 
>>>>> ZIS.available()
>>>>> specifies "Returns 0 after EOF has reached for the current entry 
>>>>> data, otherwise
>>>>> always return 1.". The implementation is straightforward,
>>>>>
>>>>>     public int available() throws IOException {
>>>>>         if (entryEOF) {
>>>>>             return 0;
>>>>>         } else {
>>>>>             return 1;
>>>>>         }
>>>>>     }
>>>>>
>>>>> However, the flag "entryEOF" is only set if a read attempt has 
>>>>> been tried
>>>>> and failed (see ZIS.read()), which means if a previous read of the 
>>>>> entry
>>>>> succeeded and it actually read all the available bytes from the 
>>>>> entry (with
>>>>> a big enough buffer), there is 0 byte available for read, the 
>>>>> "flag" is not
>>>>> set, so if you invoke zis.available(), it return 1, but a read() 
>>>>> invocation will
>>>>> returns -1 (yes, if you then try zis.available() again , it 
>>>>> returns the expected
>>>>> 0). The test below demonstrates this issue.
>>>>>
>>>>>    public static void main(String args[]) throws IOException {
>>>>>        ZipInputStream zis = new ZipInputStream(new 
>>>>> FileInputStream(args[0]));
>>>>>        ZipEntry ze;
>>>>>        while((ze = zis.getNextEntry()) !=null) {
>>>>>            System.out.println("--------------" + ze.getName() + 
>>>>> "--------");
>>>>>            byte[] buf = new byte[8102];
>>>>>
>>>>>            while (true) {
>>>>>                System.out.println("    zis.available()=" + 
>>>>> zis.available());
>>>>>                int n = zis.read(buf, 0, buf.length);
>>>>>                System.out.println("    readN=" + n);
>>>>>                if (n == -1)
>>>>>                    break;
>>>>>            }
>>>>>        }
>>>>>
>>>>>     }
>>>>>
>>>>> It is arguable that the this is not an implementation bug, since 
>>>>> it is reasonable
>>>>> to argue that the EOF has not been "really" reached yet after the 
>>>>> first try, the
>>>>> implementation happens to return all available bytes, so the next 
>>>>> read try finally
>>>>> "reached" the EOF, but this obviously is not the expectation.
>>>>>
>>>>> Unfortunately the fix we put in for #4691425 [1] logically depends on
>>>>> in.available() to decide whether or not we need to read a little 
>>>>> further, which
>>>>> directly triggers this bug when the ZIS.available() "lies".
>>>>>
>>>>>         // If there are more bytes available in "in" or
>>>>>         // the leftover in the "inf" is > 26 bytes:
>>>>>         // this.trailer(8) + next.header.min(10) + next.trailer(8)
>>>>>         // try concatenated case
>>>>>         if (this.in.available() > 0 || n > 26) {
>>>>>             ....
>>>>>         }
>>>>>
>>>>> Not only ZInputStream has this issue, its parent class 
>>>>> InflaterStream (and
>>>>> its sibling GZIPInputStream, GZS inherits InflaterStream's 
>>>>> available()
>>>>> implementation directly) has the same issue, I do have a bug#7031075
>>>>> filed against GZIPInputStream.
>>>>>
>>>>> So the proposed fix is more a workaround for this available() 
>>>>> issue. The
>>>>> alternative is to fix the issue directly, for example, to change 
>>>>> the ZIS's
>>>>> available implementation to something like
>>>>>
>>>>>     public int available() throws IOException {
>>>>>         ensureOpen();
>>>>>         if (entryEOF || entry == null)
>>>>>             return 0;
>>>>>         switch (entry.method) {
>>>>>         case DEFLATED:
>>>>>             return (inf.finished() || inf.needsDictionary()) ? 0 : 1;
>>>>>         case STORED:
>>>>>             return remaining > 0 ? 1 : 0;
>>>>>         default:
>>>>>             throw new ZipException("invalid compression method");
>>>>>         }
>>>>>     }
>>>>>
>>>>> we probably should go further to simply remove the flag "entryEOF"
>>>>> and move the "deflated" case implementation into InflaterInputStream
>>>>> (to fix 7031075 as well).
>>>>>
>>>>> -Sherman
>>>>>
>>>>>
>>>>> [1] http://cr.openjdk.java.net/~sherman/4691425/webrev/
>>>>> [2] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7031075
>>>>>
>>>>> On 05/10/2013 01:03 AM, Ivan Gerasimov wrote:
>>>>>> Hello everybody!
>>>>>>
>>>>>> GzipInputStream uses SequenceInputStream to concatenate the 
>>>>>> underlying stream with some data that reside in memory.
>>>>>> SequenceInputStream is implementing in such a way that it closes 
>>>>>> the stream when it reaches EOS.
>>>>>>
>>>>>> The solution is to wrap the underlying stream with extended 
>>>>>> FilterInputStream that overrides the close() method.
>>>>>>
>>>>>> BUG: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7021870
>>>>>> WEBREV: http://cr.openjdk.java.net/~dmeetry/7021870/webrev.0/ 
>>>>>> <http://cr.openjdk.java.net/%7Edmeetry/7021870/webrev.0/>
>>>>>>
>>>>>> Sincerely your,
>>>>>> Ivan
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>
>




More information about the core-libs-dev mailing list