RFR [7021870] GzipInputStream closes underlying stream during reading

Ivan Gerasimov ivan.gerasimov at oracle.com
Mon May 13 13:35:15 UTC 2013


A small correction to the recipe for failure.
The failure is reproduced when a SINGLE byte is appended to the gzip 
archive.
In the scenario below the echo command appends TWO bytes. These two 
bytes are interpreted as a wrong gzip header and get successfully ignored.

On 13.05.2013 16:23, Ivan Gerasimov wrote:
> Hello Sherman!
>
> Thank you for the review!
> While I agree with you that current implementation of 
> ZipInputStream.available() is not what it's supposed to be, I doubt it 
> is the only cause of the issue.
> Please consider the following example.
> Zip archive (z.zip) consists of two entries: f1 and f2.
> f1 - gzip archive concatenated with a couple of bytes (I understand 
> it's malformed gzip archive),
> f2 - let it be a single byte text file.
>
> $ echo a > f1
> $ echo b > f2
> $ gzip f1
> $ echo a >> f1.gz
> $ zip z.zip f1.gz f2
>
> public class ZipMix {
>     private static byte[] rbuf = new byte[1];
>     public static void main(String[] args) throws Throwable {
>         FileInputStream fis = new FileInputStream("z.zip");
>         ZipInputStream zis = new ZipInputStream(fis);
>         System.out.print("-start-\n");
>         if (zis.getNextEntry() != null) {
>             InputStream gis = new GZIPInputStream(zis, 8192);
>             while (gis.read(rbuf) != -1)
>                 System.out.print(new String(rbuf));
>         }
>         if (zis.getNextEntry() != null) { // <--- throws IOException
>             while (zis.read(rbuf) != -1)
>                 System.out.print(new String(rbuf));
>         }
>         System.out.print("\n-finish-\n");
>     }
> }
>
> This code worked well with jdk1.6.20, since GZIPInputStream hadn't 
> tried to read beyond the trailer.
> This breaks with the current jdk, since GZIPInputStream tries to read 
> yet another gzip header after the first trailer, and 
> SequenceInputStream.read() calls the close() method for underlying 
> stream.
> And this would break even with the ZipInputStream.available() fixed.
>
> It seems to me that the root cause here is using SequenceInputStream, 
> which can close the stream during the read() call.
> And this is what my fix was about - to prevent closing of the 
> underlying stream.
>
> Sincerely yours,
> Ivan
>
>
> On 11.05.2013 2:48, Xueming Shen wrote:
>> The proposed fix does solve the issue.
>>
>> However it appears it does not fix the root cause that triggers this 
>> problem.
>> The root cause, or the direct trigger of this bug is the fact that 
>> ZipInputStream
>> .availble() is really NOT reliable to tell us whether or not there is 
>> any byte
>> "available" to be read from the input stream. The api doc of the 
>> ZIS.available()
>> specifies "Returns 0 after EOF has reached for the current entry 
>> data, otherwise
>> always return 1.". The implementation is straightforward,
>>
>>     public int available() throws IOException {
>>         if (entryEOF) {
>>             return 0;
>>         } else {
>>             return 1;
>>         }
>>     }
>>
>> However, the flag "entryEOF" is only set if a read attempt has been 
>> tried
>> and failed (see ZIS.read()), which means if a previous read of the entry
>> succeeded and it actually read all the available bytes from the entry 
>> (with
>> a big enough buffer), there is 0 byte available for read, the "flag" 
>> is not
>> set, so if you invoke zis.available(), it return 1, but a read() 
>> invocation will
>> returns -1 (yes, if you then try zis.available() again , it returns 
>> the expected
>> 0). The test below demonstrates this issue.
>>
>>    public static void main(String args[]) throws IOException {
>>        ZipInputStream zis = new ZipInputStream(new 
>> FileInputStream(args[0]));
>>        ZipEntry ze;
>>        while((ze = zis.getNextEntry()) !=null) {
>>            System.out.println("--------------" + ze.getName() + 
>> "--------");
>>            byte[] buf = new byte[8102];
>>
>>            while (true) {
>>                System.out.println("    zis.available()=" + 
>> zis.available());
>>                int n = zis.read(buf, 0, buf.length);
>>                System.out.println("    readN=" + n);
>>                if (n == -1)
>>                    break;
>>            }
>>        }
>>
>>     }
>>
>> It is arguable that the this is not an implementation bug, since it 
>> is reasonable
>> to argue that the EOF has not been "really" reached yet after the 
>> first try, the
>> implementation happens to return all available bytes, so the next 
>> read try finally
>> "reached" the EOF, but this obviously is not the expectation.
>>
>> Unfortunately the fix we put in for #4691425 [1] logically depends on
>> in.available() to decide whether or not we need to read a little 
>> further, which
>> directly triggers this bug when the ZIS.available() "lies".
>>
>>         // If there are more bytes available in "in" or
>>         // the leftover in the "inf" is > 26 bytes:
>>         // this.trailer(8) + next.header.min(10) + next.trailer(8)
>>         // try concatenated case
>>         if (this.in.available() > 0 || n > 26) {
>>             ....
>>         }
>>
>> Not only ZInputStream has this issue, its parent class InflaterStream 
>> (and
>> its sibling GZIPInputStream, GZS inherits InflaterStream's available()
>> implementation directly) has the same issue, I do have a bug#7031075
>> filed against GZIPInputStream.
>>
>> So the proposed fix is more a workaround for this available() issue. The
>> alternative is to fix the issue directly, for example, to change the 
>> ZIS's
>> available implementation to something like
>>
>>     public int available() throws IOException {
>>         ensureOpen();
>>         if (entryEOF || entry == null)
>>             return 0;
>>         switch (entry.method) {
>>         case DEFLATED:
>>             return (inf.finished() || inf.needsDictionary()) ? 0 : 1;
>>         case STORED:
>>             return remaining > 0 ? 1 : 0;
>>         default:
>>             throw new ZipException("invalid compression method");
>>         }
>>     }
>>
>> we probably should go further to simply remove the flag "entryEOF"
>> and move the "deflated" case implementation into InflaterInputStream
>> (to fix 7031075 as well).
>>
>> -Sherman
>>
>>
>> [1] http://cr.openjdk.java.net/~sherman/4691425/webrev/
>> [2] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7031075
>>
>> On 05/10/2013 01:03 AM, Ivan Gerasimov wrote:
>>> Hello everybody!
>>>
>>> GzipInputStream uses SequenceInputStream to concatenate the 
>>> underlying stream with some data that reside in memory.
>>> SequenceInputStream is implementing in such a way that it closes the 
>>> stream when it reaches EOS.
>>>
>>> The solution is to wrap the underlying stream with extended 
>>> FilterInputStream that overrides the close() method.
>>>
>>> BUG: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7021870
>>> WEBREV: http://cr.openjdk.java.net/~dmeetry/7021870/webrev.0/ 
>>> <http://cr.openjdk.java.net/%7Edmeetry/7021870/webrev.0/>
>>>
>>> Sincerely your,
>>> Ivan
>>>
>>
>>
>>
>
>
>




More information about the core-libs-dev mailing list