RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files

Wed Mar 29 09:34:42 UTC 2023

On Sun, 12 Feb 2023 15:41:55 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:

> ZipInputStream.readEnd currently assumes a Zip64 data descriptor if the number of compressed or uncompressed bytes read from the inflater is larger than the Zip64 magic value.
> 
> While the ZIP format  mandates that the data descriptor `SHOULD be stored in ZIP64 format (as 8 byte values) when a file's size exceeds 0xFFFFFFFF`, it also states that `ZIP64 format MAY be used regardless of the size of a file`. For such small entries, the above assumption does not hold.
> 
> This PR augments ZipInputStream.readEnd to also assume 8-byte sizes if the ZipEntry includes a Zip64 extra information field. This brings ZipInputStream into alignment with the APPNOTE format spec:
> 
> 
> When extracting, if the zip64 extended information extra 
> field is present for the file the compressed and 
> uncompressed sizes will be 8 byte values.
> 
> 
> While small Zip64 files with 8-byte data descriptors are not commonly found in the wild, it is possible to create one using the Info-ZIP command line `-fd` flag:
> 
> `echo hello | zip -fd > hello.zip`
> 
> The PR also adds a test verifying that such a small Zip64 file can be parsed by ZipInputStream.

FWIW, i reverted the refactoring and the CRC == SIGEXT support, leaving only the change to how Zip64 format is determined. Perhaps this may make the change somewhat easier to reason about.

The last version of this PR augments the Zip64 check instead of replacing it, further reducing the risk associated with this proposed change.

With the latest version of this PR, I believe we strictly expand the universe of allowable ZIP entries for ZipInputStream. 

The set of added entries is the set of entries where  a Zip64 extra field exists in the Loc, and where the compressed and uncompressed file data sizes do not exceed 0xFFFFFFFF.

Added even stricter validation to hasZip64Extra checking that the extra data field is right-sized for Zip64.

Thanks a lot for looking into this, Lance!

> Are you aware of any tools that would create this scenario as to the best of my knowledge we have not encountered one that does as of yet?

~~The following on my Mac produces a Zip64 file with an 8-byte data descriptor:~~

`echo hellohellohellohellohellohellohellohello | zip > hello.zip`

(The Zip64 mode is triggered by the streaming mode, the use of a data descriptor seems sensitive to the length of the input)

I have updated the test to use a ZIP produced using `zip -fd` as mentioned above.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1427105000
PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1427115724
PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1427951442
PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1440659920
PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1448869467
PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1448937656